| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by icyfox 983 days ago
	Practically speaking, most models today infer at 8bit or 16bit (sometimes, rarely 32). You don't see an empirical lift at more bits of precision. Size of the memory is far more important.

1 comments

OJFord 983 days ago

If we're talking about the results, is there any reason to think it should make a difference at all?

link

icyfox 982 days ago

Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.

You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

link

OJFord 982 days ago

Oh I was thinking we were talking about the same calculations on different hardware.

link