| HN Mirror

From the "An analysis of UCT in multi-player games", Nathan Sturtevant, 2008: "Multi-player UCT is nearly identical to regular UCT. At the highest level of the algorithm, the tree is repeatedly sampled until it is time to make an action. The sampling process is illustrated in Figure 2. The only difference between this code and a two-player implementation is that in line 5 the average score for player p is used instead of a single average payoff for the state."

I think this is kind of a clear statement that original paper (and after it a lot of writing on the topic) may be lacking. Of course people used this simple generalization before and it is pretty straightforward, but it is not that obvious at a first glance. And I've seen quite a lot of code examples, images explaining UCT for games and articles that were just not saying a word on this. Or even worse - just doing it wrong for multiplayer games.

Choice of action is a different topic, as I remember correctly there was also a paper proving that win rate and most robust branch are in the end performing the same ;)

Hope you will continue this series, because it is really good and code examples are really nice!