Hacker News new | ask | show | jobs
by ggerganov 1277 days ago
> There has also been a wide variety of accuracy-degrading performance optimizations like Xformers and Flash Attention, which are great tools if you are open to trading accuracy for performance ..

I wasn't aware that Flash Attention trades accuracy for performance. Either I have a wrong understanding of what FA is, or this statement is not fully accurate.

Either way - looks like great work

1 comments

From the flash attention paper:

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

So I assume they are using the approximate version as they also have an exact version.

Thanks for that - I have missed the block-sparse extension of the algorithm when I first read about it. And indeed this seems to be what the author means.