Y
Hacker News
new
|
ask
|
show
|
jobs
by
syntaxing
59 days ago
Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop