Hacker News new | ask | show | jobs
by marcyb5st 1277 days ago
From the flash attention paper:

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

So I assume they are using the approximate version as they also have an exact version.

1 comments

Thanks for that - I have missed the block-sparse extension of the algorithm when I first read about it. And indeed this seems to be what the author means.