Y
Hacker News
new
|
ask
|
show
|
jobs
by
vvipgupta
1220 days ago
The same is true for data (aka gradients) consistency while training large ML models. Asynchronous SGD is as good (and maybe even faster) than synchronous SGD:
https://papers.nips.cc/paper/2011/file/218a0aefd1d1a4be65601...