Hacker News new | ask | show | jobs
by burntoutfire 2173 days ago
> No, no. The biggest question in the field is not one that is answered by "a deeper search". The biggest question is "how can we do that without a search"?

My guess is that we're doing pattern recognision, where we recognize taht a current game state is similar to a situation that we've been in before (in some previous game), and recall the strategy we took and the outcomes it had lead to. With large enough body of experience, you can to remember lots of past attempted strategies for every kind of game state (of course, within some similarity distance).

1 comments

This insight is the essence of the AlphaZero architecture. Whereas a pure Monte Carlo Tree Search (MCTS) starts each node in the search tree with a uniform distribution over actions, AlphaZero trains a neural network to observe the game state and output a distribution over actions. This distribution is optimized to be as similar as possible to the distribution obtained from running MCTS from that state in the past. It's very similar to the way humans play games.