|
|
|
|
|
by GaggiX
2 days ago
|
|
Not to be confused with Flash Attention. What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash. |
|