| This was the question which originally led me to lose faith in deep learning for solving go. Existing research throws a bunch of professional games at a DCNN and trains it to predict the next move. It generally does quite well but fails hilariously when you give it a situation which never comes up in pro games. Go involves lots of implicit threats which are rarely carried out. These networks learn to make the threats but, lacking training data, are incapable of following up. The first step of creating AlphaGo worked the same way (and actually was worse at predicting the next move than current state of the art), but Deep Mind then took that base network and retrained it. Instead of playing the move a pro would play it now plays the move most likely to result in a win. For pros, this is the same move. But for AlphaGo, in this completely different MCTS environment, they are quite different. Deep Mind then played the engine against older versions of itself and used reinforcement learning to make the network as accurate as possible. They effectively used the human data to bootstrap a better player. The paper used a lot of other cool techniques and optimizations, but I think this one might be the coolest. |