Hacker News new | ask | show | jobs
by Kubuxu 613 days ago
It would double the size of the KV cache, which can be significant (multi-GB) at larger context sizes.