|
|
|
|
|
by WhitneyLand
149 days ago
|
|
Its often pointed out in the first sentence of a comment how a model can be run at home, then (maybe) towards the end of the comment it’s mentioned how it’s quantized. Back when 4k movies needed expensive hardware, no one was saying they could play 4k on a home system, then later mentioning they actually scaled down the resolution to make it possible. The degree of quality loss is not often characterized. Which makes sense because it’s not easy to fully quantify quality loss with a few simple benchmarks. By the time it’s quantized to 4 bits, 2 bits or whatever, does anyone really have an idea of how much they’ve gained vs just running a model that is sized more appropriately for their hardware, but not lobotomized? |
|
int4 quantization is the original release in this case; it's not been quantized after the fact. It's a bit of a nuisance when running on hardware that doesn't natively support the format (might waste some fraction of memory throughput on padding, specifically on NPU hw that can't do the unpacking on its own) but no one here is reducing quality to make the model fit.