| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by YetAnotherNick 603 days ago
	You don't need quantization aware training on larger models. 4 bit 70b and 405b models exhibit close to zero degradation in output with post training quantization[1][2]. [1]: https://arxiv.org/pdf/2409.11055v1 [2]: https://lmarena.ai/

1 comments

WanderPanda 603 days ago

I wonder why that is? because they are trained with dropout?

link

david-gpu 603 days ago

Probably because of how bloody large they are. The quantization errors likely cancel each other out over the sum of so many terms.

Same reason why you can get a pretty good reconstruction when you add random noise to an image and then apply a binary threshold function to it. The more pixels there are, the more recognizable will be the B&W reconstruction.

link