|
|
|
|
|
by fxtentacle
1654 days ago
|
|
You can mathematically prove for a lot of different algorithms (including PPO, DQN, IMPALA) that given enough experience with the game world, they will eventually converge to the optimal policy. It's just that the "enough experience" part might be so large that it's practically useless. If I remember correctly, the DeepMind x UCL RL Lecture Series proves the underlying Bellman equation in this video: https://www.youtube.com/watch?v=zSOMeug_i_M As for "hidden information" games, I thought the trick was to concatenate the current state with all past states and treat that as the new state, thereby making it an MDP. |
|
(History stacking may turn POMDPs into MDPs, but I don't know if they handle the specially adversarial nature of games like poker. That's quite different from stacking ALE frames.)