Hacker News new | ask | show | jobs
by beefnugs 489 days ago
What is the effect of this less bits? Is it like truncating hashes where you start going off into the wrong thing entirely, or more like less accuracy so that if you are talking about soft penguins it will start thinking you mean wet penguins?

And is there a domain specific term I can look into if I wanted to read about someone trying to keep all the bits, but the runtime (trying to save ram) focusing in on parts of the data instead of this quantization?

1 comments

I'm pretty far from an expert. But, at it's core ML is a bunch of matrix multiplications glued together with non-linear functions. So, quantization leads to less accuracy in the matrices of weights. Not, changes in hashes where 1 wrong bit is meaningless.

The folks who quantized DeepSeek say they used a piece of tech called "BitsAndBytes". https://unsloth.ai/blog/dynamic-4bit

Googling around for "bitsandbytes ai quantization" turns up this article which looks nice

https://generativeai.pub/practical-guide-of-llm-quantization...