|
|
|
|
|
by throwawaymaths
1097 days ago
|
|
Normally, ML training is via back propagation, which is a synchronous technique. If you try to trivially parallelize, it doesn't work, for reasons (tm). This lets you train a machine learning model of arbitrary size (bigger than can fit on a GPU, or even a multigpu node) using an actor-based distributed technique. There is a slight training cycles count penalty but it's way less than the cost of coordination. |
|