Y
Hacker News
new
|
ask
|
show
|
jobs
by
tambre
1251 days ago
I would be surprised if it did. But you probably shouldn't do professional work on GPUs that lack ECC memory.
1 comments
nl
1251 days ago
The lack of ECC memory is almost certainly not a factor. If you can train at FP8 your model will recover from a single flipped bit somewhere.
link
Loranubi
1251 days ago
I mean you could even view bit flips as a regularization technique like dropout...
link
dahart
1251 days ago
Yeah I hear it’s common practice now to avoid synchronizing GPU training kernels in order to speed things up, and it has positive regularization benefits and little downside.
link