Hacker News new | ask | show | jobs
by liuliu 41 days ago
I am actually getting interested in QAT these days, especially for LSQ+ type, but it doesn't seem like people have done that enough in open-source world at least, for 2-bit / 3-bit OPD with LSQ+ basically.
1 comments

the industry has largely moved away from QAT because the hardware required for running a quantized model are an order of magnitude less than training/QATing the fp model.

That's why things like Autoround, GPTQ, AWQ have been so popular, you don't even need enough hardware to run the original model on gpu, just cpu is enough due to the data efficiency

Thanks. I think it is a good explanation, but also suggests a gap. QAT to me, if done right, is the only way to recover performance for extreme quantization regime. The only thing matters of course, if whether it can work. My confidence in QAT comes from the LoRA can recover most quality misses in quantization, and that is still different from QAT for extreme quantization, so it could be very wrong. I need to try it anyway.