| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cypress66 1193 days ago
	The performance loss is because this is RTN quantization I believe. If you use the "4chan version" that is 4bit GPTQ, the performance loss from quantization should be very small.

1 comments

What's the 4chan version?

See https://github.com/ggerganov/llama.cpp/issues/62 (the related repo was originally posted on 4chan, is all, but the code is on GitHub)