| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vvipgupta 1220 days ago
	The same is true for data (aka gradients) consistency while training large ML models. Asynchronous SGD is as good (and maybe even faster) than synchronous SGD: https://papers.nips.cc/paper/2011/file/218a0aefd1d1a4be65601...