| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chillee 890 days ago
	Who is this :think: But no, FlashAttention already solved the memory requirements of attention. RingAttention is primarily useful for parallelizing across the sequence component.

1 comments

It's camel.

How do you do matrix vector attention without keeping the full matrix in cache, surely you don't just load unload it a million times