| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by titzer 3747 days ago
	Speculating that the reinforcement learning phase reinforced all the best winning strategies but had few examples of weak positions out of which the AI had to fight.

1 comments

versteegen 3747 days ago

AlphaGo lost half the games it played against itself, so it's not like it doesn't have millions of training examples. However maybe it didn't learn very well how to recover once it's losing, but rather concentrated on learning how to avoid that in the first place.

link