Hacker News new | ask | show | jobs
by simjnd 57 days ago
Thanks for bringing this up I looked into it, and if I understood correctly:

- Q4_0 (not K quant) is the traditional flat quantization - Q4_K (4-bit K quant) uses an imatrix and important weights get higher precision (5-6 bits instead of 4, but still largely 4 bits) - IQ4 uses an imatrix and important weights get an optimized scale to avoid clipping at 4-bit, but all the weights are still 4-bit

And yeah most quants nowadays are K quants which are importance weighted