|
|
|
|
|
by ggerganov
1277 days ago
|
|
> There has also been a wide variety of accuracy-degrading performance optimizations like Xformers and Flash Attention, which are great tools if you are open to trading accuracy for performance .. I wasn't aware that Flash Attention trades accuracy for performance. Either I have a wrong understanding of what FA is, or this statement is not fully accurate. Either way - looks like great work |
|
We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.
So I assume they are using the approximate version as they also have an exact version.