Speculating that the reinforcement learning phase reinforced all the best winning strategies but had few examples of weak positions out of which the AI had to fight.
AlphaGo lost half the games it played against itself, so it's not like it doesn't have millions of training examples. However maybe it didn't learn very well how to recover once it's losing, but rather concentrated on learning how to avoid that in the first place.