| You play out the game to the end. Nope. You could play out to one or several ends, but there is a huge number of possible end games at any given point. Simply saying 'play out the game to the end' is a gross simplification. The average branching factor is something like 200, so just to fully evaluate just 5 moves ahead would require traversing 320 billion game states. Go games are 100-200 moves per game, so 5 moves ahead doesn't even get you close to the end of a game. At the first move there are 208168199381979984699478633344862770286522453884530548425639456820927419612738015378525648451698519643907259916015628128546089888314427129715319317557736620397247064840935 legal board states you'd have traverse to reach all end games. Yes, that's an exact figure. (http://tromp.github.io/go/legal.html) How big is that figure? If you counted up the number of fundamental particles in the entire universe, and for each fundamental particle there was another complete universe of fundamental particles and you counted all of those too, and then do all of that once for every person in the USA add up their answers, you'd be at about 3% of the way to that value. You simply cannot do a full evaluation to the end of the game: it's computationally infeasible. That's exactly how AlphaGo works though (minimax) No, it's not. Minimax requires working back from all end-game states, and there are too many end-game states to be evaluated. I suggest you read the paper yourself. It's about doing random partial evaluations and forward-looking searches of the game space, and choosing clever subsets to evaluate, precisely because you can't evaluate them all. The 'backing up' referred to in the paper is backing evaluation values up the tree after you've looked ahead a while, not starting at all possible end games and backing up from there, as in classic minimax. Some of the basic concepts are similar to classic tic-tac-toe or chess algorithms, but combining them in new ways. AlphaGo uses the same technique, just optimized a bit more. And a fighter jet uses the same technique as a paper aeroplane, only optimized a bit more... |
You can sample several randomly played out games. I think you knew this, and I hope you understand that sampling works, but for some reason you went on a completely irrelevant tangent.
Minimax requires working back from all end-game states
Practical implementations that need to achieve good but not perfect play are content to stop at a point before that.
Again, do you know this and are intentionally misunderstanding me and throwing up irrelevant facts? I'm sure you did given that you even acknowledge the above (no need for terminal states) in your previous post.
The difficulty of Go was that there was no way to combine tree search with the uncertainty inherent in Monte Carlo samples with a low sample count. The paper I mentioned fixed this, and hence was THE major breakthrough. The resulting algorithm is still quite simple, yet plays arbitrarily strong, also in practice, when given more computing resources.
It makes a minimax-like algorithm work for go because it estimates the terminal results by Monte Carlo sampling the results of the games when played out by a random (or in practise, not entirely random) player.