| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ipsum2 1100 days ago
	The ideas are orthogonal, and can be used (theoretically) at the same time.

1 comments

scv119 1100 days ago

I believe you can slightly change the flash attention kernel to implement the same kernel of this page attention, since both of them work on the key/value cache at block level.

link