|
|
|
|
|
by mjaskowski
3745 days ago
|
|
Note that Neural Network is just a very complex function. You usually think of Q as a function (S, A) -> (Expected accumulated future reward) which is equivalent to S -> A -> (Expected accumulated future reward) the Neural Network is S -> (A -> (Expected accumulated future reward))
or if you whish the output layer of neural network consists of |A| neurons. Each indicates the (Expected accumulated future reward) given current experience. |
|
So what we are saying is that a neural network can be used as the implementation for the q-function? I.e., a q-function is by definition only a mapping of (S,A) pairs to an expected future reward. We can do this using a traditional style like value iteration or back propagation, or we can use a neural network? And it's just a matter of implementation?