Y
Hacker News
new
|
ask
|
show
|
jobs
by
behohippy
493 days ago
Qwen is a little fussy about the sampler settings, but it does run well quantized. If you were getting infinite repetition loops, try dropping the top_p a bit. I think qwen likes lower temps too
1 comments
Eisenstein
493 days ago
We are talking about dynamically quantizing KV cache, not the model weights.
link
behohippy
491 days ago
I run the KV cache at Q8 even on that model. Is it not working well for you?
link