| HN Mirror

I think that MuZero is a fascinating algorithm, but that a lot of news articles are misleading when they present it as a new, superior substitute for AlphaZero.

MuZero is solving a harder problem, in which the learning agent does not have a model of the environment from the start (e.g. it does not know the rules of the game a priori). This makes it potentially applicable to a larger number of real-world challenges.

However, I haven't seen any evidence that it is any better than AlphaZero at learning games such as Chess or Go. Although DeepMind reports that their MuZero agent "slightly exceeds the performances of AlphaZero on Go", they say nothing about the training time and tuning effort spent on each.

As far as I understand and in the absence of further data, I think AlphaZero is still the superior choice to solve games with known rules, especially if you don't have DeepMind's level of computing resources.

If anyone knows better about this, I would be happy to be proven wrong though.