|
|
|
|
|
by karmasimida
1100 days ago
|
|
I think they are orthogonal. Flash attention is just another way to compute exact attention. This work mainly concerns how to resolve memory fragmentation across different sequences You still need to compute attention as is once you retrieve the needed key values |
|