Hacker News new | ask | show | jobs
by raindeer2 232 days ago
The first bit is why it is called Stochastic gradient decent. You follow the gradient of a randomly chosen minibatch at each step. It basically makes you "vibrate" down along the gradient.