|
|
|
|
|
by simjnd
57 days ago
|
|
Thanks for bringing this up I looked into it, and if I understood correctly: - Q4_0 (not K quant) is the traditional flat quantization
- Q4_K (4-bit K quant) uses an imatrix and important weights get higher precision (5-6 bits instead of 4, but still largely 4 bits)
- IQ4 uses an imatrix and important weights get an optimized scale to avoid clipping at 4-bit, but all the weights are still 4-bit And yeah most quants nowadays are K quants which are importance weighted |
|