Hacker News new | ask | show | jobs
by tredre3 57 days ago
> - Importance-weighted quantization (e.g. IQ4) also provides way better PPL, KDL, etc. at the same size as a Q4.

All the Q quants from big quant providers are importance-weighted (imatrix) nowadays.

The main (possibly only?) difference between Q and IQ today is that IQ uses a lookup table to achieve better compression. That is also why IQ suffers more when it can't fully fit into VRAM.

It's important to teach people the distinction and not perpetuate wrong assumptions of the past. If one needs/wants static quants, ignoring IQ_ isn't enough.

1 comments

Thanks for bringing this up I looked into it, and if I understood correctly:

- Q4_0 (not K quant) is the traditional flat quantization - Q4_K (4-bit K quant) uses an imatrix and important weights get higher precision (5-6 bits instead of 4, but still largely 4 bits) - IQ4 uses an imatrix and important weights get an optimized scale to avoid clipping at 4-bit, but all the weights are still 4-bit

And yeah most quants nowadays are K quants which are importance weighted