| I would be extremely interested in seeing this used as a pruning function for a state-of-the-art MCTS (Monte Carlo Tree Search) Go engine. As it stands, your generic MCTS algorithm expands a game tree of nodes, and gives more attention to more promising branches, but it still must give attention to other branches to find out if they are promising or not (exploration vs. exploitation). In the paper, they get the right move (right as defined by what an expert human would do) 44% of the time, but they also say the right move, if not the #1 choice, is often in the top few choices. According to their graph it's in the top 10 choices about 80% of the time, and in the top 30 choices about 98% of the time. If in MCTS you could prune the branching factor of your tree search down from 300+ to ~30, that could be huge. ===== I'd also be interested in seeing it used as the playout function of an MCTS engine. As it stands, most playout functions use random, or random+quick heuristic to playout thousands (or millions) of random games to rate a position. I imagine if you used this, which can output an entire probability distribution of moves, you could do significantly better than random with a fewer number of games. |
[1] ..and a faked visit-count. So the algorithm beliefs that the action was already evaluated n times, which resulted in the given mean utility.