Hacker News new | ask | show | jobs
by utopcell 3486 days ago
Aligned float updates are atomic in all architectures that matter. Also, unsynchronized parameter updates for SGD have actually been studied in [1], where it was shown that they don't affect performance.

In the limit, performance would indeed suffer as all updates would happen in parallel.

[1] Recht, Benjamin, et al. "Hogwild: A lock-free approach to parallelizing stochastic gradient descent." Advances in Neural Information Processing Systems. 2011.

1 comments

There's another paper describing the "Hogbatch" approach that shows more exactly the effect of adding cores on accuracy: http://www.ece.ubc.ca/~matei/papers/ipdps16.pdf.

The summary would be that accuracy per pass suffers slightly, but since the speedup is close to linear for the first dozen or so cores, each pass is much faster to run. The result is that the wall time to achieve a given level of accuracy is much shorter despite the slightly lower accuracy per pass.