Hacker News new | ask | show | jobs
by feral 4482 days ago
My impression is that you don't understand how these things work, so I'll start from the basics; apologies if wrong.

First off, the chess counter-example was in response to something general ovolve said about 'all minimax algos'.

More broadly, in practice the issue is that you only have computational resources to search a fraction of possible game states. So where do you direct your limited resources?

Some candidate strategies: minimax with AB, exhaustive, MC sampling.

Even with any of these strategies, we can usually only search the game state tree to a given depth. (This is the tree of 'If I go here, then the game goes there, then I go here etc'.) When we reach our depth limit (if we haven't reached a terminal state - i.e. a win or loss), we use a heuristic (maybe 'number of free tiles') which approximates the value of the state to us at that point.

>Meanwhile, the computer has only a few possible moves, increasing the probability of doing something accidentally optimal.

If the computer only has a few possible moves, then, yes, it might randomly choose the best one. But if it only has a few possible moves, chances are that any tree search technique, even stochastic, will start to expand all of them.

The question is, though, how will you know how good each of these moves is when you start examining are? I.e. Which of the moves available to you right now should you make? With Minimax/AB, you'll get a sense the move is optimal, because you'll look at the consequences of the move, (if the game makes the worst response to me (if I make the best move for me (if the game makes the worst response for me ... (heuristic evaluation)))).

With (sensible) MC search, you'll instead get a sense of which move is optimal by looking at more like what happens (if I make the best move for me (for each of a bunch of random moves the game could make (then if I make the best move for me (for each of a bunch of random moves the game could make ... (heuristic evaluation)))).

My point is that the latter is more suitable for this domain.

>As you approach the end, needing over half the board just for unbuilt path up to 1024 and thereby not having as much scratch space, the probability of it hitting a problematic (even if not "devastating", one that suddenly requires you to reorganize things to "clean up the mess") move seems more more of a problem than when playing chess.

Well, then, if the probability of it randomly hitting a problematic move is high in a certain state, your MC search will be highly likely to come across that move, and thus you'll be likely to avoid moves that bring you to that state. So, no problem.

In summary: 1) If the game is highly likely to make a devastating move by random chance, then you are highly likely to come across that move in your stochastic search, so that's not really an objection. 2) In this game, just as in chess, there are always more states you'd like to search and expand than you have the resources to. Even if the computer only has a handful of 'moves' at any time, in order to tell how good these moves around, you've still got to expand out a lot of states. Choosing to use minimax instead of MC doesn't save you from that.

1 comments

I am having a difficult time figuring out how to respond to this comment given that the assumption going into this conversation is that you are wrong and we are only interested in figuring out why you are wrong. For avoidance of doubt, we are assuming you are wrong because ovolve claimed to have implemented an algorithm matching your description, that implementation was available for your perusal (it is only commented out, not deleted), and it didn't actually work better. Your comments, however, seem to continue to operate from the assumption that you are correct.

Also, while I do not have the background you do with search functions, I didn't feel like my comment (which I saw as offering an idea more than a proof: a comment about intuitions based on having wasted way way too much time playing 2048 yesterday and from being in the chess club at a different time in my life) warranted the "let me teach you the basics" paragraph, especially under the "we are working together to figure out why you are wrong" assumption. I am sufficiently confused by these differing approaches to the conversation as to not be certain how to proceed.

Like, "huh, ok, if you had a different idea for why you are wrong, what would it be?" is all I can come up with, but I don't think that fits your side of this interaction. (Maybe, if you simply feel you aren't wrong, you could look at ovolve's code and find something "wrong"/suboptimal with his algorithm? I assumed you had already done this, given the context, but maybe not? Clearly my assumptions are failing here.) I think I will just bow out, actually get some work done, and maybe ask my friends (whom have much more experience in this space than, to my belief, either of us) to explain this to me later ;P.

I am with saurik.

The utility function is an approximation. So MCMC is aggregating information over an erroneous space, and min/max is also optimizing over an erroneous space too.

Which is the correct thing to do is conditioned on how the utility function behaves. In this scenario I think min/max plays maximumly conservative, which empirically seems to be the best thing to do.

If the utility is minimize free space on the board, the min/max will try to get to a free board but doesn't take risks so gets there in a suboptimal route. The MCMC will take the odd risky move as long as a large proportion of the futures lead to a even emptier board.

Clearly you don't want risky moves, because do enough of them and it ends in disaster (and its a long game). So the utility function should be exponentially weighted against going near risky situations. However, developing such a utility function which combines well in MCMC really requires understanding too much about the future game dynamics.

For MCM to work, the utility function really needs to capture how potentially bad a situation is, and I don;t think that is easy. Min/max is naturally pessimistic in stratergy, which is probably the correct thing to do.

I agree that all approaches are approximations, and which would actually perform best in this game is an empirical question. It'd take some work to really explore and tune either minimax or MC approach, so I wouldn't throw either out due to one failed attempt.

I accept the argument that "minimax is conservative, and conservative is good" might be correct. But I don't think its likely, and, without the time to code my own solution, all I can do is give arguments to that end.

I gave one intuitive argument here: https://news.ycombinator.com/item?id=7381382

Another argument is to remember that minimax, and AB-pruning, is a really strong way of reducing your search space - because of how unfavourable adversary moves are propagated up the tree - which could result in drastic pruning if the minimax assumption is wrong.

One 'bad' state, 6 or 8 ply deep through your branching factor ~10 tree, can result in you pruning entire lines of enquiry using minimax AB; surely that can't be right if the chance of the bad state happening is tiny, and especially if an alternative is chosen which isn't much better.

So I still think that if you want to tune the search algorithm to be risk adverse, then, yes, do so; but minimax is a drastic way to achieve that. But yes, how big of an effect that decision has in practice depends on complicated things, such as correlations in the search tree. (i.e. If you find a bad state in a section of the game tree, maybe there are likely to be other bad states nearby, so its not such a big deal to prune that whole section using minimax).

I was initially surprised that minimax worked at all for the exact reasons you cite. But it actually matches my intuition from playing the game kind of a lot. I think it's generally the case that most "opponent" moves are completely OK but there's one move that results in a high probability of loss (the one that's behind your "stack"). Avoiding worse case scenarios is really the entire game (i.e. your example wherein low probability of certain loss is being compared unfavorably to a guaranteed 99% loss rate does not actually come up). So if you built a really good, comprehensive MC approach, it might just reduce itself to minimax anyway. Thus perhaps the minimax is just a simpler way to implement the right approach from the get-go.
Actually I thought about it more. There are a few approaches

so you only need to decide 1 of 4 moves at the beginning

1. min max to a finite horizon using a heuristic utility function (as implemented)

2. Dynamic program/MCMC to a finite horizon and use the heuristic. Good at modelling the opponent behaviour, but could lead to bad results with a bad heuristic. (commented out approach)

3. Sample till the game ends (infinite horizon), pick the first move that lead to the game that went the longest (or won). This avoids developing an ad hoc heuristic.

So now I vote for 3. :p