Hacker News new | ask | show | jobs
by gwern 2173 days ago
I'm not sure why YeGoblynQueenne thinks this is such a mystery. (This is not the first time I've been puzzled by their pessimism on HN.) There is no mystery here: AlphaZero shows that you can get superhuman performance by searching only a few ply by sufficiently good pattern recognition in a highly parameterized and well-trained value function, and MuZero makes this point even more emphatically by doing away with the formal search entirely in favor of an more abstract recurrent pondering. What more is there to say?
2 comments

>> (This is not the first time I've been puzzled by their pessimism on HN.)

I don't understand why you keep making personal comments like that about me. I suspect you don't realise that they are unpleasant. Please let me make it clear: such personal comments are unpleasant. Could you please stop them? Thank you.

MuZero performs a "formal search". In many more ways than one, for example optimisation is still a search for an optimal search of parameters. But I guess you mean that it doesn't perform a tree search? Quoting from the abstract of the paper on arxiv [1]:

In this work we present the MuZero algorithm which, by combining _a tree-based search_ with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, withoutany knowledge of their underlying dynamics.

(My underlining)

If I remember correctly, MuZero is model-free in the sense that it learns its own evaluation function and reward policy etc (also going by the abstract). But it retains MCTS.

Indeed, it wouldn't really make sense to drop MCTS from the architecture of a system designed to play games. I mean, it would be really hard to justify discarding a component that is well known to work and work well, both from an engineering and a scientific point of view.

_________________

https://arxiv.org/abs/1911.08265

MuZero doesn't do a search over explicit board states, MCTS (where are the playouts at the leaf nodes? where is the simulator state?) or otherwise: it does search over internal-abstract-reward-predictive-states/action pairs, much like a human does, who thinks through possible actions using an internal representation sometimes backtracking as they decide a move is bad and evaluating a different one which feels better. It's search over a tree of possible actions, sure, but this is not MCTS, even if they loosely use the phrase in places (similarly, the tree search Zero does for gameplay is not MCTS, even if people sometimes describe it that way or conflate it with the training).
I'm a bit confused. MuZero doesn't do a "formal search" (as per your previous comment) but it does a tree search (as per your current comment). It doesn't perform MCTS, even though the paper itself states it performs MCTS.

I quote from the arxiv paper again:

Appendix B Search

We now describe the search algorithm used by MuZero. Our approach is based upon Monte-Carlo tree search with upper confidence bounds, an approach to planning that converges asymptotically to the optimal policy in single agent domains and to the minimax value function in zero sum games [22].

So I'm sorry but I really don't understand what you mean here.

Also, as discussed in other comments, it's dangerous to assume anyhing about how "a human does" anything to do with any kind of mental calculation. Whatever MuZero does and however good or bad it does it, there is nothing to tell us that it does it as a human does.