Hacker News new | ask | show | jobs
by YeGoblynQueenne 2388 days ago
>> Of course DM has a chess function which does codes the rules of the next move. It can return a LOSS if you try an illegal move. But this function is NOT called for the tree roll out.

I see what you mean- the chess function computes the results of actions returned by the system. But, if you do rollouts you need to have a set of actions from which to choose and an internal representation of states resulting from those actions. MuZero learns to predict those actions and states- but that means it selects from sets of possible actions and states. The paper does not explain where do these sets come from.

For ATARI I get it, there's the physical ish controls and video frames. For the board games however, I remember very clearly from the AlphaZero paper that there was an encoding of "knight moves" and "queen moves". I also remember less clearly that the structure of the network's layers mirrored the layout of a chessboard. That's what I mean by hard-coding and in the MuZero paper there are many references to reusing the AlphaZero archietecture and no explanation of how the same components (board states, moves) are represented in MuZero.