| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by marcyb5st 1277 days ago

From the flash attention paper:

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

So I assume they are using the approximate version as they also have an exact version.

1 comments

ggerganov 1277 days ago

Thanks for that - I have missed the block-sparse extension of the algorithm when I first read about it. And indeed this seems to be what the author means.

link