The results were pretty great, so it would be fascinating to see this work with Matt's version of SC2 as mentioned elsewhere in this thread: https://news.ycombinator.com/item?id=11326119
Raw pixels. And the score. The score was separate, or rather a signal representing increased score. Relevant quote from the paper:
"The emulator’s internal state is not observed by the agent; instead it observes an image xt ∈ Rd from the emulator,
which is a vector of raw pixel values representing the current screen. In addition it receives a reward rt representing the change in game score."
Agreed. I think that is one of the major drawbacks/limitation/unaddressed aspects of deep learning algorithms -- they are primarily supervised learning. Supervised in the sense that you have to explicitly identify good and bad examples. Determination of what is good and bad itself (figuring out that number at the upper right side of the screen is a score) would be a major breakthrough with implications far beyond game playing. DNN has been a breakthrough with much better accuracy and discrimination capabilities of a complex neural network. It still requires that the researcher point out what is good and bad. We still need a just-as-significant breakthrough in unsupervised learning.
"The emulator’s internal state is not observed by the agent; instead it observes an image xt ∈ Rd from the emulator, which is a vector of raw pixel values representing the current screen. In addition it receives a reward rt representing the change in game score."
The paper: http://arxiv.org/abs/1312.5602