| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gamegoblin 4246 days ago

I would be extremely interested in seeing this used as a pruning function for a state-of-the-art MCTS (Monte Carlo Tree Search) Go engine.

As it stands, your generic MCTS algorithm expands a game tree of nodes, and gives more attention to more promising branches, but it still must give attention to other branches to find out if they are promising or not (exploration vs. exploitation).

In the paper, they get the right move (right as defined by what an expert human would do) 44% of the time, but they also say the right move, if not the #1 choice, is often in the top few choices. According to their graph it's in the top 10 choices about 80% of the time, and in the top 30 choices about 98% of the time.

If in MCTS you could prune the branching factor of your tree search down from 300+ to ~30, that could be huge.

=====

I'd also be interested in seeing it used as the playout function of an MCTS engine.

As it stands, most playout functions use random, or random+quick heuristic to playout thousands (or millions) of random games to rate a position. I imagine if you used this, which can output an entire probability distribution of moves, you could do significantly better than random with a fewer number of games.

1 comments

jpfr 4246 days ago

Integration in MCTS should be straightforward. And I would be surprised if the author's hadn't done it already. Normally, no actions are pruned away per-se. Instead, the available actions are initialised with a utility value [1] from an external oracle, i.e. the neural net. From then on, the normal MCTS procedure continues until the time or memory runs out.

[1] ..and a faked visit-count. So the algorithm beliefs that the action was already evaluated n times, which resulted in the given mean utility.

link

gamegoblin 4246 days ago

I don't think they've done it yet -- in the conclusion they say:

    The most obvious next step is to integrate a 
    DCNN into a full fledged Go playing system. For
    example, a DCNN could be run on a GPU in parallel with
    a MCTS Go program and be used to provide highly quality
    priors for what the strongest moves to consider are. Such
    a system would both be the first to bring sophisticated pat-
    tern recognitions abilities to playing Go, and have a strong
    potential ability to surpass current computer Go programs.

I agree that the integration should be exceedingly straightforward. I've written MCTS implementations (though not a Go implementation -- I used it on Connect 4 and other easy-to-code games), and it seems like you'd just plug it into your already-existing bias function.

The authors didn't mention my idea about using the probability distribution output from the network to guide the random playouts, which I would also be interested in.

link

xxxaaa 4246 days ago

From the paper: " The most obvious next step is to integrate a DCNN into a full fledged Go playing system. For example, a DCNN could be run on a GPU in parallel with a MCTS Go program and be used to provide highly quality priors for what the strongest moves to consider are. Such a system would both be the first to bring sophisticated pattern recognitions abilities to playing Go, and have a strong potential ability to surpass current computer Go programs. "

link