|
|
|
|
|
by YeGoblynQueenne
2392 days ago
|
|
Regarding MuZero- I confess to not have read the paper very carefully, but I
am confused by its claim that the new system achieves superhuman performance
without knowing the game rules. Specifically, MuZero uses MCTS and MCTS needs to have at the very least a move
generator in order to produce actions that can then be evaluated for their
results. The trained MuZero model learns the transition function and
evaluation function but I don't see in the paper where it learns what actions
are legal in the domain. And I don't understand how any architecture could
model the possible moves in a game without observing examples of external play
(i.e. not self-play). MuZero reuses the AlphaZero architecture so most likely the moves of the
pieces for Chess, Shoggi and Go are hard-coded in the architecture, as they
are in AlphaZero. There's also probably some similar hard-coding of Atari
actions, which I'm probably missing in the paper. |
|
> probably some similar hard-coding of Atari actions
Nope, no hard coding.
Consider trying to MCTS on an Atari game. You have to "learn to predict" the <next frame, action> pairs. Initially this guess is very bad, but eventually your predictions are good enough that rolling out a tree of predictions improves your action selection
For Go, and chess, we twist our self into NOT using the game rules in the simulator e.g. for each move, just indicate if GAME LOSS WIN
Whether this paper worthy of a new Nature hype cycle is a separate debate