| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by corysama 490 days ago
	"Quantized" models try to approximate the full model using less bits. "Distilled" models are other models (Llama, Qwen) that have been put through an additional training round using DeepSeek as a teacher.

1 comments

beefnugs 488 days ago

What is the effect of this less bits? Is it like truncating hashes where you start going off into the wrong thing entirely, or more like less accuracy so that if you are talking about soft penguins it will start thinking you mean wet penguins?

And is there a domain specific term I can look into if I wanted to read about someone trying to keep all the bits, but the runtime (trying to save ram) focusing in on parts of the data instead of this quantization?

corysama 488 days ago

I'm pretty far from an expert. But, at it's core ML is a bunch of matrix multiplications glued together with non-linear functions. So, quantization leads to less accuracy in the matrices of weights. Not, changes in hashes where 1 wrong bit is meaningless.

The folks who quantized DeepSeek say they used a piece of tech called "BitsAndBytes". https://unsloth.ai/blog/dynamic-4bit

Googling around for "bitsandbytes ai quantization" turns up this article which looks nice

https://generativeai.pub/practical-guide-of-llm-quantization...