Hacker News new | ask | show | jobs
by marvosyn 1973 days ago
Regarding bias: This is exactly true especially with the authors method, as the learned quantization ranges are fixed and accumulating biases would lead to the entire batch being clipped to 0 or 255, depending on the direction of the biases. Luckily the bias parameters are kept in int32, so the overall bias produced by them will be much smaller than 2 pct. The arithmetic errors of the int8 matmults are summed within matmul, and are therefore an unbiased estimate of the true entry in the result matrix.