Hacker News new | ask | show | jobs
by xoroshiro 3166 days ago
Does this mean it learns what to search? I wonder why they thought it was a good idea. I thought the whole point of MC was that pruning algorithms like the ones in chess wouldn't work for a larger search space.
2 comments

The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level.

This was... unexpectedly good.

It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.

That's what the policy is. Given a board state, the policy gives you a distribution over all available moves.