Hacker News new | ask | show | jobs
by mjaskowski 3745 days ago
Note that Neural Network is just a very complex function.

You usually think of Q as a function (S, A) -> (Expected accumulated future reward)

which is equivalent to S -> A -> (Expected accumulated future reward)

the Neural Network is S -> (A -> (Expected accumulated future reward)) or if you whish the output layer of neural network consists of |A| neurons. Each indicates the (Expected accumulated future reward) given current experience.

1 comments

Thanks!

So what we are saying is that a neural network can be used as the implementation for the q-function? I.e., a q-function is by definition only a mapping of (S,A) pairs to an expected future reward. We can do this using a traditional style like value iteration or back propagation, or we can use a neural network? And it's just a matter of implementation?

Yes, we try to approximate Q function with neural network. Which is basically an enhanced version of gradient-descent Sarsa.

The main trick to notice is that you can't provide consecutive frames as mini-batches as these would be highly correlated and would derail stochastic gradient descent.

So we keep many frames (and all other necessary information) in memory and draw these experiences uniformly to form a minibatch that becomes input to the neural network

Stronger than that - you can think of neural networks as universal function approximators. So this is just a particular function to approximate.

See the suggestively named "Universal approximation theorem" for details.