1: https://github.com/qwopqwop200/GPTQ-for-LLaMa
So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.
Any numbers on more complex tasks then? like QA?
So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.
Any numbers on more complex tasks then? like QA?