| HN Mirror

This work introduces a new quantization scheme, NF4, for 4-bit NormalFloat, based on previous work on quantile quantization, so it's not a simple truncation, but it's also not a GPTQ-like optimization method. Figure 3 of the paper shows accuracy improvement of NF4 over FP4.