|
|
|
|
|
by Ambix
471 days ago
|
|
I did my own experiments and it looks like (surprisingly) Q4KM models often outperforms Q6 and Q8 quantised models. For bigger models (in range of 8B - 70B) the Q4KM is very good, there are no any degradation compared to full FP16 models. |
|