Y
Hacker News
new
|
ask
|
show
|
jobs
by
cypress66
1193 days ago
The performance loss is because this is RTN quantization I believe. If you use the "4chan version" that is 4bit GPTQ, the performance loss from quantization should be very small.
1 comments
xdennis
1193 days ago
What's the 4chan version?
link
aseipp
1193 days ago
See
https://github.com/ggerganov/llama.cpp/issues/62
(the related repo was originally posted on 4chan, is all, but the code is on GitHub)
link
cypress66
1193 days ago
https://rentry.org/llama-tard-v2
link