Hacker News new | ask | show | jobs
by daveguy 3747 days ago
Raw pixels. And the score. The score was separate, or rather a signal representing increased score. Relevant quote from the paper:

"The emulator’s internal state is not observed by the agent; instead it observes an image xt ∈ Rd from the emulator, which is a vector of raw pixel values representing the current screen. In addition it receives a reward rt representing the change in game score."

The paper: http://arxiv.org/abs/1312.5602

1 comments

It would be great if it could also learn to tell if it was doing fine.
Agreed. I think that is one of the major drawbacks/limitation/unaddressed aspects of deep learning algorithms -- they are primarily supervised learning. Supervised in the sense that you have to explicitly identify good and bad examples. Determination of what is good and bad itself (figuring out that number at the upper right side of the screen is a score) would be a major breakthrough with implications far beyond game playing. DNN has been a breakthrough with much better accuracy and discrimination capabilities of a complex neural network. It still requires that the researcher point out what is good and bad. We still need a just-as-significant breakthrough in unsupervised learning.