Hacker News new | ask | show | jobs
by titzer 3747 days ago
Speculating that the reinforcement learning phase reinforced all the best winning strategies but had few examples of weak positions out of which the AI had to fight.
1 comments

AlphaGo lost half the games it played against itself, so it's not like it doesn't have millions of training examples. However maybe it didn't learn very well how to recover once it's losing, but rather concentrated on learning how to avoid that in the first place.