| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kaibee 827 days ago
	I know quantizing larger models seems to be more forgiving but I’m wondering if that applies less to these extreme-MoE models. It seems to be that it should be more like quantizing a 3B model.

1 comments

coder543 827 days ago

4-bit is fine for models of all sizes, in my experience.

The only reason I personally don’t quantize tiny models very much is because I don’t have to, not because the accuracy gains from running at 8-bit or fp16 are that great. I tried out 4-bit Phi-3 yesterday, and it was just fine.

link