Hacker News new | ask | show | jobs
by OJFord 936 days ago
If we're talking about the results, is there any reason to think it should make a difference at all?
1 comments

Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.

You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

Oh I was thinking we were talking about the same calculations on different hardware.