Hacker News new | ask | show | jobs
by argonaut 3715 days ago
Your argument is very unclear, and your last sentence makes no sense. What exactly is the problem with jointly learning perception and control? They are widely considered to be intertwined problems in robotics. Not independent at all. But for what it's worth, you will find researchers working on all sorts of different approaches.

Clearly this competition is an attempt to build off successes in reinforcement learning with agents that play games using only images and scores.

1 comments

I don't know if your parent comment is correct, but their argument is really easy to follow. I'll put what I think their argument is in different words.

"this contest is like making Google make Alphago have to also include a robot and image recognition and making the robot have to place the stones." obviously that has "nothing to do with" the game and is "how Alphago gets the information."

But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge. If you have access to the internal data structure representing the Go or tetris board that's the same as having to scrape it off of a screen and recognize it or do real world image recognition.

If your parent comment is wrong it's because that's not the kind of game Doom is.

So what you consider "intertwined" really isn't, unless Google has not even built a go engine, since a human was doing the perception.

(again, I am just saying your parent's argument is easy to follow, not that they're correct in this case.)

I got that part. Neither you nor the original poster gave a good reason why that's a bad idea. After all, "end to end" learning is the holy grail of AI.

The only reason people have historically (and many currently) separate the two is because it's been too hard! But that's the point of research: solving hard problems.

Thank you, for the excellent rephrasing.

>But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge.

In general, "perception of the game state" is not a challenge, at least according to good game design principles (e.g. in danmaku shmups, perception can be a challenge because of visual effects that are not really part of the game, but this is seen as poor game design, similarly to how being unable to differentiate backgrounds from platforms in a run&jump is bad design). Although there are games where the perception of the game state is a game mechanic, but Doom isn't really one.

But even in Doom, you can separate quite neatly the two tasks. The vision task essentially aims to reconstruct a model of the world. But in a video-game, this model comes for free. You can trivially limit the information an agent get to what he would get as a player (in games like MGS it's already the case, albeit in a very simplistic way). It's fairly easy to make a function that computes what is visible, what sounds a player would hear, etc. You can then rephrase the problem as make an AI that can only access this function, and this wouldn't change anything.

So for the AI community, I think a more interesting question would have been to design an AI over such a function.

If the contestants are using deep learning, I don't see why it should be any more difficult to generate a meaningful, low-dimensional representation of the game-state from raw pixels than from an abstract view input.