|
|
|
|
|
by YeGoblynQueenne
2396 days ago
|
|
>> The MuZero training does not use MCTS, it merely observes sequences of
moves/states/rewards. I'm sorry, I read the paper a bit more carefully since we're discussing it and
I don't think this is right. It's true that it's a while since I read the
AlphaZero paper and the details are a bit fuzzy in my memory, but in the
MuZero paper it's clear that MCTS is used to generate a policy and estimated
value for a current hidden state, and to select an action to take at the
current real game state (the "environment"), then the observed state and
reward are later reused as past observations to train the model, together with
future actions, also selected by MCTS. So it seems to me that MCTS is pretty
central to the training process. The paper does say that any MDP could be used in place of MCTS but I don't
think anyone seriously plans on using something else than MCTS for board games
in the foreseeable future. I'm confused by your use of the term "implicit tree search". Could you
clarify? |
|