Does this mean it learns what to search? I wonder why they thought it was a good idea. I thought the whole point of MC was that pruning algorithms like the ones in chess wouldn't work for a larger search space.
The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level.
This was... unexpectedly good.
It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.
This was... unexpectedly good.
It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.