| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sinwave 4260 days ago

While this complaint generally has validity, their paper [1] does IMO present an advance; it's not just handing a bunch of labeled data off to a large neural network.

IIRC (forgive me, I read the paper a few weeks ago) the solution is at its core a reinforcement learning system, with the deep net only making up the component that predicts reward from a (state, action) pair. With that in hand, there remains the non-trivial RL problem of balancing "exploration vs exploitation" in learning good strategies to play the game(s). While NN's have been used in this capacity before, I believe that, as other comments have mentioned, using a deep net to learn to map a high-dimension state-action space (e.g,the state of the game represented as pixels of the screen at a particular time) to expected reward in real time was indeed an advance, both theoretical and technical.

And, oh yeah, I just remembered that a University of Texas research group is doing work in this area too (there was a recent paper [2] from Peter Stone and others).

(Edited for clarity)

(Edited again to suggest another paper).

[1] - http://arxiv.org/pdf/1312.5602.pdf

[2] - http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/TCIAI...