Hacker News new | ask | show | jobs
by pkAbstract 593 days ago
Exactly. The smaller bit widths from quantization might marginally decrease the compute required for each operation, but they do not reduce the overall volume of operations. So, the effect of quantization is generally more impactful on memory use than compute.
1 comments

Except in this case they quantized both the parameters and the activations leading to decreased compute time too.