|
|
|
|
|
by marcyb5st
1277 days ago
|
|
From the flash attention paper: We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. So I assume they are using the approximate version as they also have an exact version. |
|