Hacker News new | ask | show | jobs
by icyfox 936 days ago
Practically speaking, most models today infer at 8bit or 16bit (sometimes, rarely 32). You don't see an empirical lift at more bits of precision. Size of the memory is far more important.
1 comments

If we're talking about the results, is there any reason to think it should make a difference at all?
Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.

You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

Oh I was thinking we were talking about the same calculations on different hardware.