Hacker News new | ask | show | jobs
by logicrook 3715 days ago
The question seems flawed, as having an AI making decisions only based on visual information is basically confusing how you get the information (visual) and what information the AI get (only limited information, similar to what the player has). Two different problems that can be solved completely independently. The first problem makes no sense for a game (how intensive would be the computation), while the latter one could be very interesting, since it would rely on designing the AI more like a natural player. The catch however is that it's a "could", in itself there is no reason to imagine such AIs would make the game better in any way (over "cheating" AI's).
3 comments

Your argument is very unclear, and your last sentence makes no sense. What exactly is the problem with jointly learning perception and control? They are widely considered to be intertwined problems in robotics. Not independent at all. But for what it's worth, you will find researchers working on all sorts of different approaches.

Clearly this competition is an attempt to build off successes in reinforcement learning with agents that play games using only images and scores.

I don't know if your parent comment is correct, but their argument is really easy to follow. I'll put what I think their argument is in different words.

"this contest is like making Google make Alphago have to also include a robot and image recognition and making the robot have to place the stones." obviously that has "nothing to do with" the game and is "how Alphago gets the information."

But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge. If you have access to the internal data structure representing the Go or tetris board that's the same as having to scrape it off of a screen and recognize it or do real world image recognition.

If your parent comment is wrong it's because that's not the kind of game Doom is.

So what you consider "intertwined" really isn't, unless Google has not even built a go engine, since a human was doing the perception.

(again, I am just saying your parent's argument is easy to follow, not that they're correct in this case.)

I got that part. Neither you nor the original poster gave a good reason why that's a bad idea. After all, "end to end" learning is the holy grail of AI.

The only reason people have historically (and many currently) separate the two is because it's been too hard! But that's the point of research: solving hard problems.

Thank you, for the excellent rephrasing.

>But Go (and tetris etc) are games of perfect information where perception of the game state is not a challenge.

In general, "perception of the game state" is not a challenge, at least according to good game design principles (e.g. in danmaku shmups, perception can be a challenge because of visual effects that are not really part of the game, but this is seen as poor game design, similarly to how being unable to differentiate backgrounds from platforms in a run&jump is bad design). Although there are games where the perception of the game state is a game mechanic, but Doom isn't really one.

But even in Doom, you can separate quite neatly the two tasks. The vision task essentially aims to reconstruct a model of the world. But in a video-game, this model comes for free. You can trivially limit the information an agent get to what he would get as a player (in games like MGS it's already the case, albeit in a very simplistic way). It's fairly easy to make a function that computes what is visible, what sounds a player would hear, etc. You can then rephrase the problem as make an AI that can only access this function, and this wouldn't change anything.

So for the AI community, I think a more interesting question would have been to design an AI over such a function.

If the contestants are using deep learning, I don't see why it should be any more difficult to generate a meaningful, low-dimensional representation of the game-state from raw pixels than from an abstract view input.
The data the agent perceives is going to be very different from what would be attained from access to dooms internals, as a 'normal' AI would have. The challenge is to bridge the image recognition and intelligence together efficiently and effectively.

Of course it does not make sense to use in a real game implementation. The idea is that such technology could (with further development) be used in real world scenarios/applications. The problem is simply posed in a game environment to make it interesting and easier to approach.

Think about players hiding behind foliage. The AI cheats that are used now make this a lot less practical / fun than a per-pixel visibility test.