Hacker News new | ask | show | jobs
by vvipgupta 1220 days ago
The same is true for data (aka gradients) consistency while training large ML models. Asynchronous SGD is as good (and maybe even faster) than synchronous SGD: https://papers.nips.cc/paper/2011/file/218a0aefd1d1a4be65601...