Hacker News new | ask | show | jobs
by GaggiX 2 days ago
Not to be confused with Flash Attention.

What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash.