| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chriscappuccio 473 days ago
	Better to run the Q8 model on an epyc pair with 768GB, you'll get the same performance

1 comments

ltbarcly3 473 days ago

The Q8 model is totally different?

link

manmal 472 days ago

My experience with quantizations is that anything below 6 is noticeably worse. Coherence suffers. I’ve rarely gotten anything really useful out of a Q4 model, code wise. For transformations they are great though, eg convert JSON to Markdown and vice versa.

link

ltbarcly3 471 days ago

No I mean the quantized versions of this model in particular have less parameters as well. They are almost different models.

link

yieldcrv 472 days ago

I like Q5

The sweet spot for me

link