| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by blt 2173 days ago
	This insight is the essence of the AlphaZero architecture. Whereas a pure Monte Carlo Tree Search (MCTS) starts each node in the search tree with a uniform distribution over actions, AlphaZero trains a neural network to observe the game state and output a distribution over actions. This distribution is optimized to be as similar as possible to the distribution obtained from running MCTS from that state in the past. It's very similar to the way humans play games.