Hacker News new | ask | show | jobs
by icyfox 943 days ago
Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.

You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

1 comments

Oh I was thinking we were talking about the same calculations on different hardware.