| HN Mirror

Checkout Dagger [2], SEARN [3] and LOLS [1] (LOLS is available in vowpal wabbit search capabilities). A lot of interesting stuff on mimicking optimal policies, local optimality, joint learning and similar stuff :D

The whole point of playing is doing your decisions jointly, dependent on the previous decisions. If you learn your model that way it'll make its decisions trying to minimize future regret.

Local optimality is a very nice property. It means that if you play out a game, not a single change of any of the previous moves could lead you to a better result. Of course, local optimality is hard but for some problems it's pretty easy to achieve if your optimal policy is good, and your features are adequate (which they will be if you use neural networks).

Of course, flappy bird is pretty local game and all of this might be an overkill :D

AlphaGo wasn't trained jointly over Go games, so it's lacking in that regard. But the power of neural networks is compensating. Who can imagine what AlphaGo would be like if they trained their policy networks jointly? :D

A nice introduction to LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

[1]: http://arxiv.org/pdf/1502.02206.pdf

[2]: http://arxiv.org/pdf/1011.0686.pdf

[3]: http://searn.hal3.name