|
|
|
|
|
by Houshalter
3017 days ago
|
|
SGD is embarrassingly parallel as well. You can train a net on several different examples simultaneously and combine the gradients or learned weights. The reason it's not done so much is because the bandwidth of moving huge numbers of gradients or weights between computers is pretty significant. There's been all sorts of research into compressing them or reducing the precision. However this is a problem for evolutionary algorithms as well. |
|