Hacker News new | ask | show | jobs
by jpfr 4198 days ago
Integration in MCTS should be straightforward. And I would be surprised if the author's hadn't done it already. Normally, no actions are pruned away per-se. Instead, the available actions are initialised with a utility value [1] from an external oracle, i.e. the neural net. From then on, the normal MCTS procedure continues until the time or memory runs out.

[1] ..and a faked visit-count. So the algorithm beliefs that the action was already evaluated n times, which resulted in the given mean utility.

2 comments

I don't think they've done it yet -- in the conclusion they say:

    The most obvious next step is to integrate a 
    DCNN into a full fledged Go playing system. For
    example, a DCNN could be run on a GPU in parallel with
    a MCTS Go program and be used to provide highly quality
    priors for what the strongest moves to consider are. Such
    a system would both be the first to bring sophisticated pat-
    tern recognitions abilities to playing Go, and have a strong
    potential ability to surpass current computer Go programs.
I agree that the integration should be exceedingly straightforward. I've written MCTS implementations (though not a Go implementation -- I used it on Connect 4 and other easy-to-code games), and it seems like you'd just plug it into your already-existing bias function.

The authors didn't mention my idea about using the probability distribution output from the network to guide the random playouts, which I would also be interested in.

From the paper: " The most obvious next step is to integrate a DCNN into a full fledged Go playing system. For example, a DCNN could be run on a GPU in parallel with a MCTS Go program and be used to provide highly quality priors for what the strongest moves to consider are. Such a system would both be the first to bring sophisticated pattern recognitions abilities to playing Go, and have a strong potential ability to surpass current computer Go programs. "