Hacker News new | ask | show | jobs
by woadwarrior01 48 days ago
> at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy

That's a tall claim. By that measure, even NVIDIA's QAD, which is AFAIK is currently SOTA for 4-bit quantization (albeit NVFP4 instead of INT4) would be worse than Q4_K_M RTN quantization. :D

https://arxiv.org/abs/2601.20088