Hacker News new | ask | show | jobs
by fnbr 920 days ago
Ah, well you could use a standard value network, but it’d end really slow, so you probably want to train a smaller one and rely on the implicit ensembling that MCTS does to make it better.

In my experience, PUCT does a lot better than UCT, so you want to also have a prior network.

You don’t have to train a new network, but in my experience, it works much better. I haven’t spent a ton of time using off the shelf networks with MCTS though. Maybe it works great.

very subtle bugs is the MCTS experience. Particularly once parallelism is involved.

1 comments

really interesting! thanks for the info!