| As someone patiently explained to me 2 yrs ago... For the ATARI, the "real world" is the present frame, and a fixed set of 4 buttons and 4 directions. This of course is the game pre-programmed into the ALE ROM. You can take any action, and get the next frame. but you cant "undo" an action, and you cant restart a game from a fixed state (see the Go-Explore controversy). And you cant explore 4 different actions in an interesting frame. So now, if you learn a network which predicts the next frame, you can enter the world of model-based learning, where we do a simulated move tree roll-out (i.e. not calling the ATARI), try a gazillions moves, and only then select an action and get the next sample. In a formally defined synthetic domain such as chess or logic programming, it is not clear whether this is helpful. We are simply trading one cpu time (calling the environment) for other cpu time (running our own learned im-precise model of the environment) Of course DM has a chess function which does codes the rules of the next move. It can return a LOSS if you try an illegal move. But this function is NOT called for the tree roll out. |
I appreciate that someone explaiend this to you at some point but I'm going with what I've read in some of the published papers and the ones I've read really leave a lot to the imagination. That is no way to present and support such big claims as "no rules", "no hands", especially when this is the central claim in a paper. Why fudge this so much when it's such an important aspect of the whole contribution? [1] You (general you) make a claim? Support the claim.
I didn't get what you mean about logic programming? Where does that come in?
________________
[1] Oh, I know why. It's the whole silly game with machine learning publications where they never tell you everything and you have to figure it out yourself. Well I like to play the other game, where I call bullshit unless it's explained clearly. In the paper. Not on Twitter and not by kind colleagues.
Silly games don't advance the science though.