Hacker News new | ask | show | jobs
by 2bitencryption 3745 days ago
Thanks!

So what we are saying is that a neural network can be used as the implementation for the q-function? I.e., a q-function is by definition only a mapping of (S,A) pairs to an expected future reward. We can do this using a traditional style like value iteration or back propagation, or we can use a neural network? And it's just a matter of implementation?

2 comments

Yes, we try to approximate Q function with neural network. Which is basically an enhanced version of gradient-descent Sarsa.

The main trick to notice is that you can't provide consecutive frames as mini-batches as these would be highly correlated and would derail stochastic gradient descent.

So we keep many frames (and all other necessary information) in memory and draw these experiences uniformly to form a minibatch that becomes input to the neural network

Stronger than that - you can think of neural networks as universal function approximators. So this is just a particular function to approximate.

See the suggestively named "Universal approximation theorem" for details.