| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nl 1251 days ago
	The lack of ECC memory is almost certainly not a factor. If you can train at FP8 your model will recover from a single flipped bit somewhere.

1 comments

Loranubi 1251 days ago

I mean you could even view bit flips as a regularization technique like dropout...

link

dahart 1251 days ago

Yeah I hear it’s common practice now to avoid synchronizing GPU training kernels in order to speed things up, and it has positive regularization benefits and little downside.

link