Hacker News new | ask | show | jobs
by chillee 843 days ago
Who is this :think:

But no, FlashAttention already solved the memory requirements of attention. RingAttention is primarily useful for parallelizing across the sequence component.

1 comments

It's camel.

How do you do matrix vector attention without keeping the full matrix in cache, surely you don't just load unload it a million times