Hacker News new | ask | show | jobs
by ovolve 4480 days ago
Heh. If you look in the code you'll see a big commented out chunk where I tried randomly sampling computer moves to get sort of an 'expected value' for the opposition's move. Empirically, it performed worse. I think this is for the same reason that all minimax algos assume optimal play by the opponent: if you assume optimal and they play less than so, it can only work in your favor. However I think there's some truth in the fact that sometimes an unpredicted random computer move can mess things up. Unfortunately exhaustively enumerating all possible computer moves was way too slow (for single-threaded javascript in the browser).
5 comments

Its interesting that your approach performed worse; I wonder if it could be modified to do better? Interesting.

>I think this is for the same reason that all minimax algos assume optimal play by the opponent: if you assume optimal and they play less than so, it can only work in your favor.

That doesn't make sense when talking about a random opponent, though.

Imagine its chess. You are considering moving a pawn into a position where it can be obviously taken with no cost by the other player, but where, if the other player doesn't take the pawn, you'll get a sure checkmate on the next go.

You'd never make that move against an 'intelligent' player (i.e. in a minimax setup). But you'd definitely consider it against an enemy that moves randomly, because the expected value is so high.

This isn't to discount your empirical experience with this game, just to make a more general point.

When playing against a random opponent, sometimes it will "accidentally" make the best possible move. In chess, this might be extremely unlikely, as there are so many possible moves: the probability that, of all the pieces on the board, with all of the positions it could have made, that it would stumble upon a move that devastates your position--especially considering that often multiple correct moves must be performed in specific sequence to take advantage of what "should be" a winning position--is vanishingly small.

Further: the potential for gain (trivial mate against a stupid opponent in a handful of moves) is great. If you are thereby playing for "fastest win over time", taking advantage of random's suboptimality seems sane. In 2048, the bottleneck on your score seems best approximated by how long you survive: pulling stunts won't get you to 2048 all that faster as you need to have worked through enough tiles to arrive at that point: I'd imagine the difference would be at best a tiny fraction of the "required" moves.

Meanwhile, the computer has only a few possible moves, increasing the probability of doing something accidentally optimal. As you approach the end, needing over half the board just for unbuilt path up to 1024 and thereby not having as much scratch space, the probability of it hitting a problematic (even if not "devastating", one that suddenly requires you to reorganize things to "clean up the mess") move seems more more of a problem than when playing chess.

In summary: I just don't think comparing this game to chess is leading to useful intuitions.

It sounds like you're thinking he suggested just greedily taking the best possible move. But that's not what he suggested; he suggested evaluating the actual distribution of the opponent's moves and making the move with the highest expected value.
No, I entirely understood that. I am quite happily willing to believe that I misunderstand the ramifications of that algorithm (I spent a lot of time talking about game theory in college, but it was not my field, and that was a long time ago), but that is the algorithm I inferred. (I would respond more, but you didn't leave me with any specific complaints to respond to, so all I can really say is "no".)
The question is do you want to have the shortest expected number of turns until you win, or a 100% win rate. Playing safe (min-max) is what ensures the 100% win rate, but you might be making games longer on average.
It doesn't have a 100% win rate as it stands.

Mini-max isn't even 'playing safe'.

Consider the following choice of moves, each leading to one of 5 random tile inserts:

Move A, which leads to 5 possible moves with the following game state goodnesses: ['Loss','Win','Win','Win','Win']

Move B, resulting in: ['99%CertainLoss','99%CertainLoss','99%CertainLoss','99%CertainLoss','99%CertainLoss']

Minimax is never going to choose A. Is that really what you want, even if your strategy is 'play it safe'?

Fair enough; I only ran it a couple times and it got to 2048 so I assumed it's a guaranteed win.
My impression is that you don't understand how these things work, so I'll start from the basics; apologies if wrong.

First off, the chess counter-example was in response to something general ovolve said about 'all minimax algos'.

More broadly, in practice the issue is that you only have computational resources to search a fraction of possible game states. So where do you direct your limited resources?

Some candidate strategies: minimax with AB, exhaustive, MC sampling.

Even with any of these strategies, we can usually only search the game state tree to a given depth. (This is the tree of 'If I go here, then the game goes there, then I go here etc'.) When we reach our depth limit (if we haven't reached a terminal state - i.e. a win or loss), we use a heuristic (maybe 'number of free tiles') which approximates the value of the state to us at that point.

>Meanwhile, the computer has only a few possible moves, increasing the probability of doing something accidentally optimal.

If the computer only has a few possible moves, then, yes, it might randomly choose the best one. But if it only has a few possible moves, chances are that any tree search technique, even stochastic, will start to expand all of them.

The question is, though, how will you know how good each of these moves is when you start examining are? I.e. Which of the moves available to you right now should you make? With Minimax/AB, you'll get a sense the move is optimal, because you'll look at the consequences of the move, (if the game makes the worst response to me (if I make the best move for me (if the game makes the worst response for me ... (heuristic evaluation)))).

With (sensible) MC search, you'll instead get a sense of which move is optimal by looking at more like what happens (if I make the best move for me (for each of a bunch of random moves the game could make (then if I make the best move for me (for each of a bunch of random moves the game could make ... (heuristic evaluation)))).

My point is that the latter is more suitable for this domain.

>As you approach the end, needing over half the board just for unbuilt path up to 1024 and thereby not having as much scratch space, the probability of it hitting a problematic (even if not "devastating", one that suddenly requires you to reorganize things to "clean up the mess") move seems more more of a problem than when playing chess.

Well, then, if the probability of it randomly hitting a problematic move is high in a certain state, your MC search will be highly likely to come across that move, and thus you'll be likely to avoid moves that bring you to that state. So, no problem.

In summary: 1) If the game is highly likely to make a devastating move by random chance, then you are highly likely to come across that move in your stochastic search, so that's not really an objection. 2) In this game, just as in chess, there are always more states you'd like to search and expand than you have the resources to. Even if the computer only has a handful of 'moves' at any time, in order to tell how good these moves around, you've still got to expand out a lot of states. Choosing to use minimax instead of MC doesn't save you from that.

I am having a difficult time figuring out how to respond to this comment given that the assumption going into this conversation is that you are wrong and we are only interested in figuring out why you are wrong. For avoidance of doubt, we are assuming you are wrong because ovolve claimed to have implemented an algorithm matching your description, that implementation was available for your perusal (it is only commented out, not deleted), and it didn't actually work better. Your comments, however, seem to continue to operate from the assumption that you are correct.

Also, while I do not have the background you do with search functions, I didn't feel like my comment (which I saw as offering an idea more than a proof: a comment about intuitions based on having wasted way way too much time playing 2048 yesterday and from being in the chess club at a different time in my life) warranted the "let me teach you the basics" paragraph, especially under the "we are working together to figure out why you are wrong" assumption. I am sufficiently confused by these differing approaches to the conversation as to not be certain how to proceed.

Like, "huh, ok, if you had a different idea for why you are wrong, what would it be?" is all I can come up with, but I don't think that fits your side of this interaction. (Maybe, if you simply feel you aren't wrong, you could look at ovolve's code and find something "wrong"/suboptimal with his algorithm? I assumed you had already done this, given the context, but maybe not? Clearly my assumptions are failing here.) I think I will just bow out, actually get some work done, and maybe ask my friends (whom have much more experience in this space than, to my belief, either of us) to explain this to me later ;P.

I am with saurik.

The utility function is an approximation. So MCMC is aggregating information over an erroneous space, and min/max is also optimizing over an erroneous space too.

Which is the correct thing to do is conditioned on how the utility function behaves. In this scenario I think min/max plays maximumly conservative, which empirically seems to be the best thing to do.

If the utility is minimize free space on the board, the min/max will try to get to a free board but doesn't take risks so gets there in a suboptimal route. The MCMC will take the odd risky move as long as a large proportion of the futures lead to a even emptier board.

Clearly you don't want risky moves, because do enough of them and it ends in disaster (and its a long game). So the utility function should be exponentially weighted against going near risky situations. However, developing such a utility function which combines well in MCMC really requires understanding too much about the future game dynamics.

For MCM to work, the utility function really needs to capture how potentially bad a situation is, and I don;t think that is easy. Min/max is naturally pessimistic in stratergy, which is probably the correct thing to do.

I agree that all approaches are approximations, and which would actually perform best in this game is an empirical question. It'd take some work to really explore and tune either minimax or MC approach, so I wouldn't throw either out due to one failed attempt.

I accept the argument that "minimax is conservative, and conservative is good" might be correct. But I don't think its likely, and, without the time to code my own solution, all I can do is give arguments to that end.

I gave one intuitive argument here: https://news.ycombinator.com/item?id=7381382

Another argument is to remember that minimax, and AB-pruning, is a really strong way of reducing your search space - because of how unfavourable adversary moves are propagated up the tree - which could result in drastic pruning if the minimax assumption is wrong.

One 'bad' state, 6 or 8 ply deep through your branching factor ~10 tree, can result in you pruning entire lines of enquiry using minimax AB; surely that can't be right if the chance of the bad state happening is tiny, and especially if an alternative is chosen which isn't much better.

So I still think that if you want to tune the search algorithm to be risk adverse, then, yes, do so; but minimax is a drastic way to achieve that. But yes, how big of an effect that decision has in practice depends on complicated things, such as correlations in the search tree. (i.e. If you find a bad state in a section of the game tree, maybe there are likely to be other bad states nearby, so its not such a big deal to prune that whole section using minimax).

> That doesn't make sense when talking about a random opponent, though.

Sure it does. AB search means you play to maximise your own value and minimise the value for your opponent. If your opponent is using a random strategy they will likely make a poor move and that's extra advantage for you.

Your argument here (and elsewhere in this thread) seems to be: if the opponent is random a stochastic strategy is best. This is simply not true. Stochastic strategies like MC search have an advantage over game-tree search when the game's branching factor limits you to considering just a few ply ahead (e.g. as in Go).

Consider this game:

You choose either Die or Coin.

If you chose Die, I will roll a die. If the die is 1, I give you $0; otherwise, $1000.

If you choose the coin, I'll give you $10 for heads, and $20 for tails.

If you use a minimax policy, you will choose the coin, because the worst outcome of that choice is a $10 payoff. The 'best' 'worst' outcome is what you get with minimax, and that's $10.

It is true that, as you say, if you instead get $20, you will be pleasantly surprised. "extra advantage for you", as you put it.

But it should be crystal clear that minimax is nonetheless the wrong decision policy here. If you use a minimax policy when the outcomes are random, you will generally be doing it wrong.

There are exceptions (e.g. you are starving and need $10 or else you'll die; or you are in my casino, and you think I'm using loaded dice, in which case the outcomes aren't random and you have an adversary; etc.) but in general, minimax is just wrong there.

>your argument here (and elsewhere in this thread) seems to be: if the opponent is random a stochastic strategy is best. This is simply not true.

No - my argument is that if an opponent is random a minimax strategy is generally not best.

>Stochastic strategies like MC search have an advantage over game-tree search when the game's branching factor limits you to considering just a few ply ahead (e.g. as in Go).

Yes, that's also true.

In the game of 2048, it's actually about risk minimization - in general, you progress at a steady, fixed speed unless something ugly happens. If you're in a safe position (which you should be throughout almost all of the game) then your possible gains from a great move are zero; if you don't take some advantage this turn then you'll generally get it next turn; but a single bad move can put you in an unfixable position.

There are almost no opportunities to make a 'great' move - since a great move with a significantly better effect on chance of winning than a move which doesn't change anything at all - is possible only if the move fixes a huge earlier mistake that you wouldn't have made in the first place.

For this particular game, advantages are temporary and disadvantages are near-permanent - so it makes sense to play very defensively, which minmax does. Imagine a game of Die or Coin, where if you choose coin, then you get $10 for heads, and $20 for tails; and for dice you throw a hundred-side die and get $25 for values 2-100 but if you roll 1, then you get shot and die.

[edit] what I'm saying is that assuming that [a] all payoffs are either effectively 0 or -infinity; and [b] most moves will (in the near expected future) be either 0 chance of the bad event or >0 chance of the bad event; then minmax would generate equal results to MC search - however, MC search would fail badly if you put overly optimistic payoffs, i.e., give too large rewards for 'good moves' and too little penalties for bad moves; and this is hard to estimate.

Minmax works if your payoff scale is completely wrong by orders of magnitude as long as the preference ordering is correct, MC search doesn't. If you know that position A is better than B but don't know if it is 1.1 times better or a million times better - then you can't implement a good MC search but can implemnt minmax.

You describe a one-player game with stochastic outcomes. I'm talking about a deterministic two-player game with one player making random moves. They're not the same thing.
Actually they're exactly the same. As in, completely identical games under a very trivial mapping. Think about how you'd formulate each one in terms of payoff matrices...
You're right. Still, there is something missing here: the game has no win condition. It is true that the best play seems to be to go for dice but suppose that rolling a 1 means I lose the game and forefit all my winnings. Why would I ever play that?
I suspect that the best objective function would be "maximize the time until I lose" which is closer to avoiding the worst case scenario than it is to performing best in the average case scenario.
I was also working on a 2048 AI, but you beat me to it! Nice job.

I was at an early stage, my scoring is just the game score (which apparently is flawed!) so nothing fancy. My approach to the large branching factor with the random tiles was just to ignore them and continue the search with a random one until there are less than N tiles free (I empirically found 4/5 to be good), where I perform a full search and apply a simple mean to the resulting scores. Mine is not even minimax, just a simple depth-first search. It wins between 1/4 and 1/5 of the time in a small sample of 200 games, how does yours perform?

This would be a great AI competition! Too bad it takes so long to run lots of games (mine is ~10 turns a second).

I wonder if web workers would speed it up.

> However I think there's some truth in the fact that sometimes an unpredicted random computer move can mess things up.

Isn't this guaranteed by the no free lunch theorem?

No. No Free Lunch Theorem means that there is no algorithm (that is programmed with no awareness of any specific search space) can be successful in ALL search spaces (problems).

But 2048 is a highly-structured search space. Even with random opponent, the opponent's choices are constrained by the game's rule structure.

http://en.wikipedia.org/wiki/No_free_lunch_theorem

The new tiles only appears on the opposite of your move direction. Your code does not seem to care about that. Maybe it will perform better if this is considered?
A friend has a question for you: "How many moves does it think ahead? Or does it just think in the 'here-and-now' (only 1 move ahead)?"
Someone else posted that it uses the minimax algorithm with alpha beta pruning and iterative deepening.

Minimax[1] looks "several" moves ahead to see what may happen. It assumes you always play your best possible move (to MAXimise your score) and your opponent always plays their best possible move (to maximise their score and MINimises yours). Because this creates a huge amount of possibilities, their is a method to reduce the number of moves considered called alpha beta pruning[2]. Some future moves can never lead to a better score for you and so are discarded or ignored.

Both approaches need you to look as many moves ahead as you can - each extra move you consider multiplies the number of possibilities exponentially, so there is another method called iterative deepening[3] which uses the algorithm to look 1 move ahead to find the best move and records that move. Then it repeats looking at 2 moves ahead and if it finds a better score than with 1 move it keeps that instead. Then 3 moves, then 4. The program will continue looking at more and more moves ahead until it reaches a time limit (or a size limit.)

[1] https://en.wikipedia.org/wiki/Minimax#Minimax_algorithm_with...

[2] https://en.wikipedia.org/wiki/Alpha-beta_pruning

[3] https://en.wikipedia.org/wiki/Iterative_deepening_depth-firs...