|
|
|
|
|
by letlambda
3168 days ago
|
|
The policy network is a function from board states to a scoring of moves. The policy network with the greedy heuristic, ie pick the highest rated move with no explicit look ahead method, plays at a high amateur level. This was... unexpectedly good. It effectively reduces the branching factor of Go from the number of moves available, to the number of moves actually worth considering. |
|