| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by feral 4526 days ago

Consider this game:

You choose either Die or Coin.

If you chose Die, I will roll a die. If the die is 1, I give you $0; otherwise, $1000.

If you choose the coin, I'll give you $10 for heads, and $20 for tails.

If you use a minimax policy, you will choose the coin, because the worst outcome of that choice is a $10 payoff. The 'best' 'worst' outcome is what you get with minimax, and that's $10.

It is true that, as you say, if you instead get $20, you will be pleasantly surprised. "extra advantage for you", as you put it.

But it should be crystal clear that minimax is nonetheless the wrong decision policy here. If you use a minimax policy when the outcomes are random, you will generally be doing it wrong.

There are exceptions (e.g. you are starving and need $10 or else you'll die; or you are in my casino, and you think I'm using loaded dice, in which case the outcomes aren't random and you have an adversary; etc.) but in general, minimax is just wrong there.

>your argument here (and elsewhere in this thread) seems to be: if the opponent is random a stochastic strategy is best. This is simply not true.

No - my argument is that if an opponent is random a minimax strategy is generally not best.

>Stochastic strategies like MC search have an advantage over game-tree search when the game's branching factor limits you to considering just a few ply ahead (e.g. as in Go).

Yes, that's also true.

2 comments

PeterisP 4525 days ago

In the game of 2048, it's actually about risk minimization - in general, you progress at a steady, fixed speed unless something ugly happens. If you're in a safe position (which you should be throughout almost all of the game) then your possible gains from a great move are zero; if you don't take some advantage this turn then you'll generally get it next turn; but a single bad move can put you in an unfixable position.

There are almost no opportunities to make a 'great' move - since a great move with a significantly better effect on chance of winning than a move which doesn't change anything at all - is possible only if the move fixes a huge earlier mistake that you wouldn't have made in the first place.

For this particular game, advantages are temporary and disadvantages are near-permanent - so it makes sense to play very defensively, which minmax does. Imagine a game of Die or Coin, where if you choose coin, then you get $10 for heads, and $20 for tails; and for dice you throw a hundred-side die and get $25 for values 2-100 but if you roll 1, then you get shot and die.

[edit] what I'm saying is that assuming that [a] all payoffs are either effectively 0 or -infinity; and [b] most moves will (in the near expected future) be either 0 chance of the bad event or >0 chance of the bad event; then minmax would generate equal results to MC search - however, MC search would fail badly if you put overly optimistic payoffs, i.e., give too large rewards for 'good moves' and too little penalties for bad moves; and this is hard to estimate.

Minmax works if your payoff scale is completely wrong by orders of magnitude as long as the preference ordering is correct, MC search doesn't. If you know that position A is better than B but don't know if it is 1.1 times better or a million times better - then you can't implement a good MC search but can implemnt minmax.

link

gradstudent 4526 days ago

You describe a one-player game with stochastic outcomes. I'm talking about a deterministic two-player game with one player making random moves. They're not the same thing.

link

bermanoid 4525 days ago

Actually they're exactly the same. As in, completely identical games under a very trivial mapping. Think about how you'd formulate each one in terms of payoff matrices...

link

gradstudent 4525 days ago

You're right. Still, there is something missing here: the game has no win condition. It is true that the best play seems to be to go for dice but suppose that rolling a 1 means I lose the game and forefit all my winnings. Why would I ever play that?

link

feral 4525 days ago

Why would you ever play dice, if rolling 1 meant losing everything?

Lots of situations. If you had $0, obviously.

Probably if you had less than $833.33 (the expected value of the dice in my game), slightly more subtly.

Maybe even other scenarios, depending on how long you wanted to play for, if there was a time cost to playing, if you could play for as long as you wanted, what your utility/risk preferences were, etc.

The point is that there are appropriate tools for reasoning about such games correctly - utility, decision theory, etc. - and they beat minimax.

link

gradstudent 4524 days ago

You're changing the rules and shifting the goalposts. Look, it's simple: if the objective is to win then you want to avoid giving your opponent any chance to beat you. That means you never choose dice.

> The point is that there are appropriate tools for reasoning about such games correctly - utility, decision theory, etc. - and they beat minimax.

Perhaps this is true but you have not convinced me that this is the case here. Moreover, the OP claims to have empirical results showing that AB-pruning does better than MC-search for the game of 2048.

link

lightvector 4515 days ago

I'm not too surprised that alpha-beta does well, because when you are in a "good" state, playing so as to ensure that even in the worst case, you will still remain in as good a state as possible is very similar to playing to minimize the probability of ending up in a bad state in the next few moves. Obviously, the latter is your actual goal rather than the former. But the former is often a decent approximation and has the benefit that if most of the time there actually does exist a way you can maintain a "good" state even under perfectly adversarial play, then alpha-beta pruning will let you search much deeper and faster to find that way.

However, I would not dismiss MC-search or expectimax. Whereas alpha-beta should perform well when the current position is in a "good" state, when the current position is in a "bad" or "tricky" state, it almost certainly will perform way worse than search methods that actually take into account the opponent being random.

To consider the extreme example, imagine the position is extremely tenuous and that alpha-beta has proven that if the tiles are placed in the worst possible way in the next several turns, it is guaranteed the position is a loss and a 2048 is impossible. Obviously, it would be better that point to switch to something like expectimax and play so as to minimize the probability of a loss and maximize the probability of recovering, which could still be quite large. Whereas with alpha-beta the only thing you can do is maximize the number of moves until the forced loss, which will often not be the same thing as maximizing the probability of recovery, particularly if most reasonable branches of play lead to a forced loss in the same number of moves.

Similar arguments apply when the position is not a forced loss yet but is still "bad". For example, if the "bad" position is such that your long-term win probability is only 30%, I would expect alpha-beta to pass-up opportunities to fix the position if those opportunities carried any risk of making the position worse, whereas taking those risks is clearly correct if they could fix the position into a probable win and the probability of making things worse was only 1/2 or 1/3.

On the relevant stackoverflow thread, there is another answer much further down that has received far less attention, which is a shame: http://stackoverflow.com/questions/22342854/what-is-the-opti...

I've downloaded and compiled the solver, which uses expectimax rather than alpha-beta, and uses heuristics that are arguably even simpler and dumber, and yet it seems to outperform the OP's solver. It gets 4096 with good consistency and even often gets an 8192, whereas the OP's solver (after a tweak to make it not stop at 2048) can sometimes require a few tries to get a 4096 and only rarely gets an 8192. Granted, there's a large computing power difference, since one runs as native compiled code and one runs as javascript in the browser, but it shows that expectimax, which one would expect to be the most theoretically-sound approach, is indeed at least a comparably good approach, and possibly a superior one.

link