|
|
|
|
|
by gcp
3737 days ago
|
|
It's not at all easy to say who's winning or losing at most points in the game, or if there even is a winner or a loser. You play out the game to the end. making backward-looking techniques like minimax impossible, even if you could easily assign an evaluation function to each state. That's exactly how AlphaGo works though. Read this paper, it was the landmark result that showed how to do this: http://www.remi-coulom.fr/CG2006/ AlphaGo uses the same technique, just optimized a bit more. |
|
Nope. You could play out to one or several ends, but there is a huge number of possible end games at any given point. Simply saying 'play out the game to the end' is a gross simplification.
The average branching factor is something like 200, so just to fully evaluate just 5 moves ahead would require traversing 320 billion game states. Go games are 100-200 moves per game, so 5 moves ahead doesn't even get you close to the end of a game. At the first move there are 208168199381979984699478633344862770286522453884530548425639456820927419612738015378525648451698519643907259916015628128546089888314427129715319317557736620397247064840935 legal board states you'd have traverse to reach all end games. Yes, that's an exact figure. (http://tromp.github.io/go/legal.html)
How big is that figure? If you counted up the number of fundamental particles in the entire universe, and for each fundamental particle there was another complete universe of fundamental particles and you counted all of those too, and then do all of that once for every person in the USA add up their answers, you'd be at about 3% of the way to that value.
You simply cannot do a full evaluation to the end of the game: it's computationally infeasible.
That's exactly how AlphaGo works though (minimax)
No, it's not. Minimax requires working back from all end-game states, and there are too many end-game states to be evaluated. I suggest you read the paper yourself. It's about doing random partial evaluations and forward-looking searches of the game space, and choosing clever subsets to evaluate, precisely because you can't evaluate them all. The 'backing up' referred to in the paper is backing evaluation values up the tree after you've looked ahead a while, not starting at all possible end games and backing up from there, as in classic minimax. Some of the basic concepts are similar to classic tic-tac-toe or chess algorithms, but combining them in new ways.
AlphaGo uses the same technique, just optimized a bit more.
And a fighter jet uses the same technique as a paper aeroplane, only optimized a bit more...