|
|
|
|
|
by gwern
3163 days ago
|
|
DM has already done a bunch of work on 'deep models' of environments to plan over. Use them and you have 'model-predictive control' and planning, and this tree extension to policy gradients would work as well (probably). It could be pretty interesting to see what would happen if you tried that sort of hybrid on ALE. |
|