|
|
|
|
|
by beefnugs
489 days ago
|
|
What is the effect of this less bits? Is it like truncating hashes where you start going off into the wrong thing entirely, or more like less accuracy so that if you are talking about soft penguins it will start thinking you mean wet penguins? And is there a domain specific term I can look into if I wanted to read about someone trying to keep all the bits, but the runtime (trying to save ram) focusing in on parts of the data instead of this quantization? |
|
The folks who quantized DeepSeek say they used a piece of tech called "BitsAndBytes". https://unsloth.ai/blog/dynamic-4bit
Googling around for "bitsandbytes ai quantization" turns up this article which looks nice
https://generativeai.pub/practical-guide-of-llm-quantization...