Hacker News new | ask | show | jobs
by itsTyrion 79 days ago
> "wanted to run glm-4.7-flash:q8_0" > q8_0

a well made (as in, unsloth) smaller quant will help a good amount here, without a notable reduction in performance or increase in perplexity