| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by casercaramel144 891 days ago
	Huh? I thought the issue before ringattention is the memory requirement of the softmax layer, since you have to load the whole matrix in at once? It's O(s^2) no? Also hi horace.

1 comments

chillee 890 days ago

Who is this :think:

But no, FlashAttention already solved the memory requirements of attention. RingAttention is primarily useful for parallelizing across the sequence component.

link

casercaramel144 890 days ago

It's camel.

How do you do matrix vector attention without keeping the full matrix in cache, surely you don't just load unload it a million times

link