|
|
|
|
|
by yonkshi
2740 days ago
|
|
There is currently no NN learning algorithm that can handle massively paralleled training. We can use some simple fixes such as mean gradient but they have severe limitations, and their limitation grows as you scale up. Currently even with a handful paralleled GPU training, the gradient computation needs to wait for all GPU batch to complete and then return to CPU before the next batch, so your idea of infinite scaling is just a pipe dream at the moment. Sure Python can't be run on a million pages, but heck, no NN architecture can even handle more than a dozen paralleled computation in a stable manner yet. |
|