Hacker News new | ask | show | jobs
by dahart 1251 days ago
Yeah I hear it’s common practice now to avoid synchronizing GPU training kernels in order to speed things up, and it has positive regularization benefits and little downside.