| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sdhgaiojfsa 2906 days ago

Didn't someone just recently post a DQN solution for Montezuma's Revenge (the game that according to this article they cannot solve)?

> "Though DQN is great at games like Breakout, it is still not able to tackle relatively simple games like Montezuma's Revenge"

Yep:

https://www.engadget.com/2016/06/09/google-deepmind-ai-monte...

https://blog.openai.com/learning-montezumas-revenge-from-a-s...

It's far too early in this research to say what exactly what can and can't be solved by RL.

2 comments

fnbr 2906 days ago

The Open AI solution uses demonstrations though, which is the article's point, that bare DQN can't solve the games, and something like demonstrations are needed.

link

radarsat1 2906 days ago

But if you look at how it uses the demonstrations it's quite interesting. It uses them only as a series of starting points to start learning from. It doesn't actually use state-action pair examples at all, as far as I understand, which is quite different from the idea that comes to mind from the phrase "uses demonstration". It simply starts up the simulation at places that the one single demonstration it has access to got to. In other words, the examples it is learning from are nothing but "by this point in the game you could get here.." But nothing about "when you are here, you should do this.."

In a sense it's pretty similar to how you'd learn a game if you watched someone play it through once. (Except backwards, perhaps.)

link

andreyk 2906 days ago

yep, we in fact link to this work...

"Even 5 years later, no pure RL algorithms have cracked reasoning and memory games; on the contrary, approaches that have done well at them have either used instructions <link> or demonstrations <link> just as we mentioned would make sense to do in the board game allegory."

link

backpropaganda 2906 days ago

It also assumes access to the simulator, which is an even more problematic assumption. That's like saying you're doing image classification but assuming access to the 3D model which generated the image.

link

radarsat1 2906 days ago

I think that analogy is a bit bogus, but if you want to make it, it's more like assuming access to a function that renders the 3D model from a variety of perspectives on command, not having access to the model itself.

(Because the RL algorithm doesn't have access to the rules by which the simulation is carried out, it only has access to the commands and the result.)

And frankly, that would be a perfectly fair and interesting classification problem, so I don't see your point.

Otherwise, how exactly do you propose learning to drive a simulation without access to the simulation? I really don't know what you're saying here.

link

backpropaganda 2906 days ago

My point is that the two problems are quite distinct. This is not a small change to how the problem is being solved, but a complete change of the problem itself. Further the change significantly limits the feasibility of the solution, which is not sufficiently made clear by the authors of the blog post. Casual followers of AI/RL research might think that this is a significant progress, while in fact it's actually a progress on a problem that hasn't really received any attention due its uselessness. I think there may be 1-2 papers which might have experiments on this problem while probably 100s in the model-free problem.

Thanks for your analogy though. I agree that it's better than mine. I was only trying to give a rough idea, but I'll use your analogy if I have to now. :)

link

goatlover 2906 days ago

True, but the Open AI article on Montezuma's Revenge stated that their approach didn't work for Pitfall and one other game.

link