Hacker News new | ask | show | jobs
by ssivark 2949 days ago
I don't see why stochastic has to be worse than evolutionary algorithms. In high-dimensional spaces, there is a large space of possible "mutations". SGD just biases certain mutations based on the gradient from a minibatch. That sounds a lot like evaluating the fitness of a mutated population on a minibatch and culling members with low fitness. In fact, there are many demonstrations that the stochastic nature of SGD (coming from the use of minibatches) is crucial for effective learning.