Hacker News new | ask | show | jobs
by helloericsf 730 days ago
4-bit quantization tends to come at the cost of output quality losses. https://github.com/ggerganov/llama.cpp/issues/9
1 comments

Quality loss with quantization is expected. It seems like with GPTQ the loss is within acceptable range based on the perplexity score shown.