|
|
|
|
|
by throw-qqqqq
307 days ago
|
|
> I struggle to comprehend how an odd quantization like 5 bit, that doesn't align well with 8 bit boundaries, would not slow things down for inference Who says it doesn’t :)? At least in my tests there is a big penalty to using an “odd” bit stride. Testing 4bit quantization vs 5bit in Llama.cpp, I see quite a bit more than the “naiively expected” 25% slowdown from 4 to 5 bits. |
|