|
|
|
|
|
by mtremsal
3264 days ago
|
|
My understanding is that innovation comes from reinforcement learning during self-play (rather than supervised learning of pro games), and thus goes against the best moves suggested by AlphaGo's policy network, in turn pushing it towards new options. In a sense, it seems innovation arises when the value network forces the policy network to expand the search space because an apparently unlikely move leads to downstream positions deemed favorable. |
|