The Open AI solution uses demonstrations though, which is the article's point, that bare DQN can't solve the games, and something like demonstrations are needed.
But if you look at how it uses the demonstrations it's quite interesting. It uses them only as a series of starting points to start learning from. It doesn't actually use state-action pair examples at all, as far as I understand, which is quite different from the idea that comes to mind from the phrase "uses demonstration". It simply starts up the simulation at places that the one single demonstration it has access to got to. In other words, the examples it is learning from are nothing but "by this point in the game you could get here.." But nothing about "when you are here, you should do this.."
In a sense it's pretty similar to how you'd learn a game if you watched someone play it through once. (Except backwards, perhaps.)
"Even 5 years later, no pure RL algorithms have cracked reasoning and memory games; on the contrary, approaches that have done well at them have either used instructions <link> or demonstrations <link> just as we mentioned would make sense to do in the board game allegory."
It also assumes access to the simulator, which is an even more problematic assumption. That's like saying you're doing image classification but assuming access to the 3D model which generated the image.
I think that analogy is a bit bogus, but if you want to make it, it's more like assuming access to a function that renders the 3D model from a variety of perspectives on command, not having access to the model itself.
(Because the RL algorithm doesn't have access to the rules by which the simulation is carried out, it only has access to the commands and the result.)
And frankly, that would be a perfectly fair and interesting classification problem, so I don't see your point.
Otherwise, how exactly do you propose learning to drive a simulation without access to the simulation? I really don't know what you're saying here.
My point is that the two problems are quite distinct. This is not a small change to how the problem is being solved, but a complete change of the problem itself. Further the change significantly limits the feasibility of the solution, which is not sufficiently made clear by the authors of the blog post. Casual followers of AI/RL research might think that this is a significant progress, while in fact it's actually a progress on a problem that hasn't really received any attention due its uselessness. I think there may be 1-2 papers which might have experiments on this problem while probably 100s in the model-free problem.
Thanks for your analogy though. I agree that it's better than mine. I was only trying to give a rough idea, but I'll use your analogy if I have to now. :)