Hacker News new | ask | show | jobs
by mjaskowski 3744 days ago
Yes, we try to approximate Q function with neural network. Which is basically an enhanced version of gradient-descent Sarsa.

The main trick to notice is that you can't provide consecutive frames as mini-batches as these would be highly correlated and would derail stochastic gradient descent.

So we keep many frames (and all other necessary information) in memory and draw these experiences uniformly to form a minibatch that becomes input to the neural network