| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bobbylarrybobby 1086 days ago
	(Not an ML guy.) bf16 and fp16 should be comparable if the weights are of the same magnitude, but what happens in a network where the weights are poorly regularized?

1 comments

dlewis1788 1086 days ago

Someone commented below that with enough batchnorm/layernorm/etc. and/or gradient clipping you can manage it, but BF16 just makes life easier if you can live without some precision.

link