Y
Hacker News
new
|
ask
|
show
|
jobs
by
GaggiX
50 days ago
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.