Hacker News new | ask | show | jobs
by x_may 84 days ago
KV cache compression, so how much memory the model needs to use for extending its context. Does not affect the weight size.