| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ssheng 730 days ago
	How does Exllama rank among these? Heard good things about it.

2 comments

helloericsf 730 days ago

Seems interesting! https://github.com/turboderp/exllama "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."

link

helloericsf 730 days ago

4-bit quantization tends to come at the cost of output quality losses. https://github.com/ggerganov/llama.cpp/issues/9

link

ssheng 730 days ago

Quality loss with quantization is expected. It seems like with GPTQ the loss is within acceptable range based on the perplexity score shown.

link