Y
Hacker News
new
|
ask
|
show
|
jobs
by
Eisenstein
496 days ago
We are talking about dynamically quantizing KV cache, not the model weights.
1 comments
behohippy
495 days ago
I run the KV cache at Q8 even on that model. Is it not working well for you?
link