Hacker News new | ask | show | jobs
by GavCo 607 days ago
When Meta releases the quantized 70B it will give another > 2X speedup with similar accuracy: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...
2 comments

You don't need quantization aware training on larger models. 4 bit 70b and 405b models exhibit close to zero degradation in output with post training quantization[1][2].

[1]: https://arxiv.org/pdf/2409.11055v1 [2]: https://lmarena.ai/

I wonder why that is? because they are trained with dropout?
Probably because of how bloody large they are. The quantization errors likely cancel each other out over the sum of so many terms.

Same reason why you can get a pretty good reconstruction when you add random noise to an image and then apply a binary threshold function to it. The more pixels there are, the more recognizable will be the B&W reconstruction.

Probably not. Cerebras chip only has 16bit and 32bit operators.