Hacker News new | ask | show | jobs
by breuleux 3017 days ago
On the other hand, even if evolutionary algorithms require a lot of samples, they are embarrassingly parallel: you can easily try all samples simultaneously. If you have enough resources to throw at the problem, it can be faster (although more resource-intensive) to estimate the gradient this way than to compute an accurate gradient analytically.
1 comments

SGD is embarrassingly parallel as well. You can train a net on several different examples simultaneously and combine the gradients or learned weights.

The reason it's not done so much is because the bandwidth of moving huge numbers of gradients or weights between computers is pretty significant. There's been all sorts of research into compressing them or reducing the precision. However this is a problem for evolutionary algorithms as well.

Not exactly. The required bandwidth for the evolutionary strategy is actually very small: if every node knows each other's random seed, they can reconstruct the best model themselves, using the seed of whichever node declares the best result. There is no need to transfer any weights. Of course, this trick only works if it's cheap to compute the weights other nodes are using, which is not the case for SGD. So evolutionary strategies have an advantage here, even if it's not a decisive one.
Well that would work for simple hillclimbing. In a full genetic algorithm, every member of the population is the product of thousands of previous individuals.
I mean, we were talking about simple hill climbing.

Still, even more complex systems still do not require particularly high bandwidth. You only need to broadcast the fitness of the individuals, and from there each node can independently recalculate and combine the best ones.