Hacker News new | ask | show | jobs
by johbjo 1812 days ago
It can depend on what the agent "sees" and how many time-steps away the "consequences" are. If the ghosts are so far away that any action will take t time-steps before consequences to the agent, the actions are pseudo-random because there is no reward to optimize on.

The number of outcomes in branching_factor^t (very large) makes the action-values at t=0 (where the agent chooses between two/three actions) almost uniform random.

1 comments

Yes, you are right.

I experimented with different time horizons, mostly look 3-7 steps ahead.

In terms of the 'reward', that was implicit within the model - if the ghosts caught you, your ability to influence the state of the world dropped to 0.