|
|
|
|
|
by kristjansson
853 days ago
|
|
FlashAttention(2)[0] reduces context-length space complexity to linear. Compute is still O(n^2) in length though, AFAIK, so we'd expect these long sequence lengths to take some time to compute. I'm a bit out of my depth, but I think ultra-long exact-attention work like this also probably has to answer some questions about where to put the KV-cache before it can be used in practice? [0]: https://arxiv.org/abs/2205.14135 |
|