| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zero_iq 3738 days ago

I think you're greatly underestimating how difficult it is to come up with an algorithm to play Go.

Even evaluating a single play state is a difficult problem, let alone looking forwards to 'simply taking the route giving max probability of success'. That's not simple at all.

Go is fiendishly difficult to evaluate. The traditional techniques that work in chess don't work well in Go (if at all). All the pieces have the same 'value'. More or less pieces isn't necessarily better or worse, it depends on the board layout. It's not at all easy to say who's winning or losing at most points in the game, or if there even is a winner or a loser. There is no mathematically-obvious way to compare two play states and say one is better than the other. The branching factor is huge, making the usual forward-looking techniques extraordinarily expensive. The number of possible legal boards states is something like 90 orders of magnitude greater than the number of atoms in the observable universe, making backward-looking techniques like minimax impossible, even if you could easily assign an evaluation function to each state.

1 comments

gcp 3737 days ago

It's not at all easy to say who's winning or losing at most points in the game, or if there even is a winner or a loser.

You play out the game to the end.

making backward-looking techniques like minimax impossible, even if you could easily assign an evaluation function to each state.

That's exactly how AlphaGo works though. Read this paper, it was the landmark result that showed how to do this: http://www.remi-coulom.fr/CG2006/

AlphaGo uses the same technique, just optimized a bit more.

link

zero_iq 3737 days ago

You play out the game to the end.

Nope. You could play out to one or several ends, but there is a huge number of possible end games at any given point. Simply saying 'play out the game to the end' is a gross simplification.

The average branching factor is something like 200, so just to fully evaluate just 5 moves ahead would require traversing 320 billion game states. Go games are 100-200 moves per game, so 5 moves ahead doesn't even get you close to the end of a game. At the first move there are 208168199381979984699478633344862770286522453884530548425639456820927419612738015378525648451698519643907259916015628128546089888314427129715319317557736620397247064840935 legal board states you'd have traverse to reach all end games. Yes, that's an exact figure. (http://tromp.github.io/go/legal.html)

How big is that figure? If you counted up the number of fundamental particles in the entire universe, and for each fundamental particle there was another complete universe of fundamental particles and you counted all of those too, and then do all of that once for every person in the USA add up their answers, you'd be at about 3% of the way to that value.

You simply cannot do a full evaluation to the end of the game: it's computationally infeasible.

That's exactly how AlphaGo works though (minimax)

No, it's not. Minimax requires working back from all end-game states, and there are too many end-game states to be evaluated. I suggest you read the paper yourself. It's about doing random partial evaluations and forward-looking searches of the game space, and choosing clever subsets to evaluate, precisely because you can't evaluate them all. The 'backing up' referred to in the paper is backing evaluation values up the tree after you've looked ahead a while, not starting at all possible end games and backing up from there, as in classic minimax. Some of the basic concepts are similar to classic tic-tac-toe or chess algorithms, but combining them in new ways.

AlphaGo uses the same technique, just optimized a bit more.

And a fighter jet uses the same technique as a paper aeroplane, only optimized a bit more...

link

gcp 3737 days ago

You simply cannot do a full evaluation to the end of the game: it's computationally infeasible.

You can sample several randomly played out games. I think you knew this, and I hope you understand that sampling works, but for some reason you went on a completely irrelevant tangent.

Minimax requires working back from all end-game states

Practical implementations that need to achieve good but not perfect play are content to stop at a point before that.

Again, do you know this and are intentionally misunderstanding me and throwing up irrelevant facts? I'm sure you did given that you even acknowledge the above (no need for terminal states) in your previous post.

The difficulty of Go was that there was no way to combine tree search with the uncertainty inherent in Monte Carlo samples with a low sample count. The paper I mentioned fixed this, and hence was THE major breakthrough. The resulting algorithm is still quite simple, yet plays arbitrarily strong, also in practice, when given more computing resources.

It makes a minimax-like algorithm work for go because it estimates the terminal results by Monte Carlo sampling the results of the games when played out by a random (or in practise, not entirely random) player.

link

zero_iq 3737 days ago

I think the problem is that what you meant is very different from what you actually wrote.

Practical implementations that need to achieve good but not perfect play are content to stop at a point before that.

Indeed, but that's not the basic minimax algorithm (which is what I was discussing) and the branching factor of go makes classic minimax infeasible even if you modify it to be forward-looking. Basic minimax is not good for Go. At all. You have to combine it with other techniques for it to be even vaguely effective. You know this, but didn't mention it. So how is anyone supposed to infer that you know it?

I can see now that you have a decent understanding of the actual solution, but that's not at all what you wrote above.

For example, you said to evaluate a game board you need to 'play the game through to the end' -- which hides all that complexity of monte-carlo tree evaluation, etc. Read what you wrote again from the point of view of someone who isn't yourself.

Simply saying 'Play the game through to the end' by itself is neither correct nor accurate. Because that implies evaluating the rest of the game to its conclusion. i.e. every board state. You omitted any other detail or reference. So I descibed how and why it's not feasible (which you now discount as going off at a tangent). You gave no prior indication of understanding this.

Another example: in a previous comment I described how backward looking techniques like the minimax algorithm are not feasible for go, and you said:

"That's exactly how AlphaGo works though".

But it isn't at all:

Classic minimax is backward-looking (starts at end states and works backwards -- this is what I was talking about, and what I described); AlphaGo (and the paper you linked to) is forward-looking (starts at current root and works forwards).

Classic minimax is full-tree evaluation; AlphaGo is partially-evaluating.

Minimax is deterministic; AlphaGo is stochastic/random sampling.

Minimax uses a simple single-value single-direction evaluation function; AlphaGo uses a complex bi-directional multi-value evaluation.

I could go on.

Read any treatise or any introduction to AI for Go and pretty much the first chapter will be describing how classic minimax is not suitable without heavy modification.

e.g. https://www.youtube.com/watch?v=5oXyibEgJr0

AlphaGo uses some of the same concepts, but applied very differently, as you appear to already know. Yet after I mentioned minimax (which is well-known as an infeasible algorithm for go), you responded:

"That's exactly how AlphaGo works though".

Which is not at all true.

And then you link to a paper that describes precisely how it's not how it works, and why minimax-related techniques alone don't work, in stark contradiction to yourself.

I'm not intentionally misinterpreting what you wrote: I'm taking a reasonable interpretation of what you actually wrote, which appears to show a lack of understanding on your part. I don't think there is a lack of understanding, now, but it's hard to come to a different conclusion based on what you wrote before.

link