Hacker News new | ask | show | jobs
by letlambda 3168 days ago
The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level.

This was... unexpectedly good.

It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering.