Y
Hacker News
new
|
ask
|
show
|
jobs
by
maxloh
59 days ago
I thought Q4_K_M is the standard. Why did you choose the 6-bit variant? Does it generate better input?
1 comments
SlavikCA
59 days ago
There is no standard.
The higher quantization - the better results, but more memory is needed. Q8 is the best.
link
SV_BubbleTime
59 days ago
FP32 is best, although I wonder if there isn’t something better I don’t know about. Q8 is for the most part equal to FP16 in practical terms by being smart about what is quantized, but iirc always slower than FP16 and FP8.
link
The higher quantization - the better results, but more memory is needed. Q8 is the best.