|
|
|
|
|
by YeGoblynQueenne
2388 days ago
|
|
>> Of course DM has a chess function which does codes the rules of the next
move. It can return a LOSS if you try an illegal move. But this function is
NOT called for the tree roll out. I see what you mean- the chess function computes the results of actions
returned by the system. But, if you do rollouts you need to have a set of
actions from which to choose and an internal representation of states
resulting from those actions. MuZero learns to predict those actions and
states- but that means it selects from sets of possible actions and states.
The paper does not explain where do these sets come from. For ATARI I get it, there's the physical ish controls and video frames. For
the board games however, I remember very clearly from the AlphaZero paper that
there was an encoding of "knight moves" and "queen moves". I also remember
less clearly that the structure of the network's layers mirrored the layout of
a chessboard. That's what I mean by hard-coding and in the MuZero paper there
are many references to reusing the AlphaZero archietecture and no explanation
of how the same components (board states, moves) are represented in MuZero. |
|