|
|
|
|
|
by jpfr
4198 days ago
|
|
Integration in MCTS should be straightforward. And I would be surprised if the author's hadn't done it already. Normally, no actions are pruned away per-se. Instead, the available actions are initialised with a utility value [1] from an external oracle, i.e. the neural net. From then on, the normal MCTS procedure continues until the time or memory runs out. [1] ..and a faked visit-count. So the algorithm beliefs that the action was already evaluated n times, which resulted in the given mean utility. |
|
The authors didn't mention my idea about using the probability distribution output from the network to guide the random playouts, which I would also be interested in.