Hacker News new | ask | show | jobs
by Eisenstein 496 days ago
We are talking about dynamically quantizing KV cache, not the model weights.
1 comments

I run the KV cache at Q8 even on that model. Is it not working well for you?