| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwawaymaths 1097 days ago
	Normally, ML training is via back propagation, which is a synchronous technique. If you try to trivially parallelize, it doesn't work, for reasons (tm). This lets you train a machine learning model of arbitrary size (bigger than can fit on a GPU, or even a multigpu node) using an actor-based distributed technique. There is a slight training cycles count penalty but it's way less than the cost of coordination.