|
|
|
|
|
by yazr
2389 days ago
|
|
> but I am confused by its claim ... without knowing the game rules. > probably some similar hard-coding of Atari actions Nope, no hard coding. Consider trying to MCTS on an Atari game. You have to "learn to predict" the <next frame, action> pairs. Initially this guess is very bad, but eventually your predictions are good enough that rolling out a tree of predictions improves your action selection For Go, and chess, we twist our self into NOT using the game rules in the simulator e.g. for each move, just indicate if GAME LOSS WIN Whether this paper worthy of a new Nature hype cycle is a separate debate |
|
But where do the actions come from?
For example, if I play chess, I could pick up a piece and throw it at my opponent's head. Similarly, if I play Atari I could chuck the controller at the monitor. These are actions I can perform that are available to me because of my basic human anatomy and because of the laws of physics (I can grab and throw and a thrown object flies through the air untl it hits a target or gravity wins).
In the case of MuZero, what actions can the system perform and where do they come from? I don't see where that is described in the paper.
>> For Go, and chess, we twist our self into NOT using the game rules in the simulator e.g. for each move, just indicate if GAME LOSS WIN
Similarly - what determines "each move"?
EDIT: I can see in the MuZero paper that "Final outcomes {lose,draw,win} in board games are treated as rewards $u_t \in {-1,0,+1}$ occurring at the final step of the episode" but I also can't see where these come from, what tells the model that a loss, draw or win has occurred at the end of an episode.
I mean, if you're telling the model what actions can be performed and what end-states values are, then what game rules are you _not_ giving to the system?