|
|
|
|
|
by AnthonyMouse
848 days ago
|
|
> Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources? There is an inherent trade off between model size and quality. Quantization reduces model size at the expense of quality. Sometimes it's a better way to do that than reducing the number of parameters, but it's still fundamentally the same trade off. You can't make the highest quality model use the smallest amount of memory. It's information theory, not sorcery. |
|
Quantization is essential for me since a 7B model won't fit on my RTX 2060 with only 6GB of VRAM. It allows me to compress the model so it can run on my hardware.