It highly depends on the model and the context use. A model like command-r for instance is practically unaffected by it, but Qwen will go nuts. As well, tasks highly dependent on context like translation or evaluation will be more impacted than say, code generation or creative output.
Qwen is a little fussy about the sampler settings, but it does run well quantized. If you were getting infinite repetition loops, try dropping the top_p a bit. I think qwen likes lower temps too
And q8_0 already halves the memory usage compared to fp16.
One of the ollama Devs called the quality impact negligible at q8_0: https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...
But perhaps quantifying the KV cache does not scale as gracefully as the model itself?