|
|
|
|
|
by fnbr
920 days ago
|
|
Ah, well you could use a standard value network, but it’d end really slow, so you probably want to train a smaller one and rely on the implicit ensembling that MCTS does to make it better. In my experience, PUCT does a lot better than UCT, so you want to also have a prior network. You don’t have to train a new network, but in my experience, it works much better. I haven’t spent a ton of time using off the shelf networks with MCTS though. Maybe it works great. very subtle bugs is the MCTS experience. Particularly once parallelism is involved. |
|