Hacker News new | ask | show | jobs
by GaggiX 50 days ago
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.