Hacker News new | ask | show | jobs
by mtremsal 3264 days ago
My understanding is that innovation comes from reinforcement learning during self-play (rather than supervised learning of pro games), and thus goes against the best moves suggested by AlphaGo's policy network, in turn pushing it towards new options.

In a sense, it seems innovation arises when the value network forces the policy network to expand the search space because an apparently unlikely move leads to downstream positions deemed favorable.

1 comments

It's not that simple. The creativity is that the combination of rollouts, policy and value networks allow for more efficient traversal of the search space. Which gets you better exploration of possible paths, meaning more options than a human considered and therefore more creativity.