| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nshm 1189 days ago
	Do you have the numbers? I suspect is is way worse. Original llama.cpp authors never measure any numbers as well.

3 comments

ddren 1189 days ago

The python implementation[1] ran some tests using the same quantization algorithm as llama.cpp (4 bit RTN).

1: https://github.com/qwopqwop200/GPTQ-for-LLaMa

link

nshm 1189 days ago

Great thanks a lot.

So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.

Any numbers on more complex tasks then? like QA?

link

summarity 1189 days ago

Some numbers here: https://github.com/qwopqwop200/GPTQ-for-LLaMa#result

link

sottol 1189 days ago

They're using GTPQ -- here you go: https://arxiv.org/abs/2210.17323 . The authors benchmarked two families of models over a wide range of numbers of params.

link

ddren 1189 days ago

llama.cpp is using RTN at the moment.

link