| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by icyfox 943 days ago
	Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality. You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

1 comments

Oh I was thinking we were talking about the same calculations on different hardware.