Hacker News new | ask | show | jobs
by chriscappuccio 473 days ago
Better to run the Q8 model on an epyc pair with 768GB, you'll get the same performance
1 comments

The Q8 model is totally different?
My experience with quantizations is that anything below 6 is noticeably worse. Coherence suffers. I’ve rarely gotten anything really useful out of a Q4 model, code wise. For transformations they are great though, eg convert JSON to Markdown and vice versa.
No I mean the quantized versions of this model in particular have less parameters as well. They are almost different models.
I like Q5

The sweet spot for me