Hacker News new | ask | show | jobs
by eachro 723 days ago
Training in int8 is noteable (to me). I've been out of date with ML research for a bit now but last I recall, people were mostly training at full precision and then quantizing after training and finetuning a bit on the quantized model afterwards.
1 comments

Dunno. It could also just mean the so-called "Quantization-aware training" where your weight, activation and gradient is still bf16 and just before use it gets quantized to int8 in the same way you'd do it during inference.

This gives you the same "no mismatch between training and predict", and was a standard technique back in vision days (~2018).