|
|
|
|
|
by blt
2173 days ago
|
|
This insight is the essence of the AlphaZero architecture. Whereas a pure Monte Carlo Tree Search (MCTS) starts each node in the search tree with a uniform distribution over actions, AlphaZero trains a neural network to observe the game state and output a distribution over actions. This distribution is optimized to be as similar as possible to the distribution obtained from running MCTS from that state in the past. It's very similar to the way humans play games. |
|