| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GavCo 607 days ago
	When Meta releases the quantized 70B it will give another > 2X speedup with similar accuracy: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...

2 comments

YetAnotherNick 607 days ago

You don't need quantization aware training on larger models. 4 bit 70b and 405b models exhibit close to zero degradation in output with post training quantization[1][2].

[1]: https://arxiv.org/pdf/2409.11055v1 [2]: https://lmarena.ai/

link

WanderPanda 607 days ago

I wonder why that is? because they are trained with dropout?

link

david-gpu 607 days ago

Probably because of how bloody large they are. The quantization errors likely cancel each other out over the sum of so many terms.

Same reason why you can get a pretty good reconstruction when you add random noise to an image and then apply a binary threshold function to it. The more pixels there are, the more recognizable will be the B&W reconstruction.

link

ipsum2 607 days ago

Probably not. Cerebras chip only has 16bit and 32bit operators.

link