|
|
|
|
|
by johbjo
1812 days ago
|
|
It can depend on what the agent "sees" and how many time-steps away the "consequences" are. If the ghosts are so far away that any action will take t time-steps before consequences to the agent, the actions are pseudo-random because there is no reward to optimize on. The number of outcomes in branching_factor^t (very large) makes the action-values at t=0 (where the agent chooses between two/three actions) almost uniform random. |
|
I experimented with different time horizons, mostly look 3-7 steps ahead.
In terms of the 'reward', that was implicit within the model - if the ghosts caught you, your ability to influence the state of the world dropped to 0.