Y
Hacker News
new
|
ask
|
show
|
jobs
by
ssheng
730 days ago
How does Exllama rank among these? Heard good things about it.
2 comments
helloericsf
730 days ago
Seems interesting!
https://github.com/turboderp/exllama
"A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."
link
helloericsf
730 days ago
4-bit quantization tends to come at the cost of output quality losses.
https://github.com/ggerganov/llama.cpp/issues/9
link
ssheng
730 days ago
Quality loss with quantization is expected. It seems like with GPTQ the loss is within acceptable range based on the perplexity score shown.
link