| HN Mirror

I'm pretty far from an expert. But, at it's core ML is a bunch of matrix multiplications glued together with non-linear functions. So, quantization leads to less accuracy in the matrices of weights. Not, changes in hashes where 1 wrong bit is meaningless.

The folks who quantized DeepSeek say they used a piece of tech called "BitsAndBytes". https://unsloth.ai/blog/dynamic-4bit

Googling around for "bitsandbytes ai quantization" turns up this article which looks nice

https://generativeai.pub/practical-guide-of-llm-quantization...