| HN Mirror

Sure. The talk about 8bit refers to quantization-aware training. Pretty common in image models these days to reduce the impact of quantization on accuracy.

Typically this might mean that you simulate an 8bit forward pass to ensure that the model is robust to quantization ‘noise’. You still use FP16/32 for backward pass & weight updates for numerical stability.

It’s just a way to optimize the model in anticipation of future quantization. The experience of using an 8-bit Nemo quant should more closely mirror that of using the full-fat bf16 model compared to if they hadn’t used QAT.