|
|
|
|
|
by claiir
426 days ago
|
|
Yea they mention a “perplexity drop” relative to naive quantization, but that’s meaningless to me.
> We reduce the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0. Wish they showed benchmarks / added quantized versions to the arena! :> |
|