Hacker News new | ask | show | jobs
by gwern 2173 days ago
MuZero doesn't do a search over explicit board states, MCTS (where are the playouts at the leaf nodes? where is the simulator state?) or otherwise: it does search over internal-abstract-reward-predictive-states/action pairs, much like a human does, who thinks through possible actions using an internal representation sometimes backtracking as they decide a move is bad and evaluating a different one which feels better. It's search over a tree of possible actions, sure, but this is not MCTS, even if they loosely use the phrase in places (similarly, the tree search Zero does for gameplay is not MCTS, even if people sometimes describe it that way or conflate it with the training).
1 comments

I'm a bit confused. MuZero doesn't do a "formal search" (as per your previous comment) but it does a tree search (as per your current comment). It doesn't perform MCTS, even though the paper itself states it performs MCTS.

I quote from the arxiv paper again:

Appendix B Search

We now describe the search algorithm used by MuZero. Our approach is based upon Monte-Carlo tree search with upper confidence bounds, an approach to planning that converges asymptotically to the optimal policy in single agent domains and to the minimax value function in zero sum games [22].

So I'm sorry but I really don't understand what you mean here.

Also, as discussed in other comments, it's dangerous to assume anyhing about how "a human does" anything to do with any kind of mental calculation. Whatever MuZero does and however good or bad it does it, there is nothing to tell us that it does it as a human does.