Hacker News new | ask | show | jobs
by behohippy 493 days ago
Qwen is a little fussy about the sampler settings, but it does run well quantized. If you were getting infinite repetition loops, try dropping the top_p a bit. I think qwen likes lower temps too
1 comments

We are talking about dynamically quantizing KV cache, not the model weights.
I run the KV cache at Q8 even on that model. Is it not working well for you?