| Consider this game: You choose either Die or Coin. If you chose Die, I will roll a die. If the die is 1, I give you $0; otherwise, $1000. If you choose the coin, I'll give you $10 for heads, and $20 for tails. If you use a minimax policy, you will choose the coin, because the worst outcome of that choice is a $10 payoff. The 'best' 'worst' outcome is what you get with minimax, and that's $10. It is true that, as you say, if you instead get $20, you will be pleasantly surprised. "extra advantage for you", as you put it. But it should be crystal clear that minimax is nonetheless the wrong decision policy here.
If you use a minimax policy when the outcomes are random, you will generally be doing it wrong. There are exceptions (e.g. you are starving and need $10 or else you'll die; or you are in my casino, and you think I'm using loaded dice, in which case the outcomes aren't random and you have an adversary; etc.) but in general, minimax is just wrong there. >your argument here (and elsewhere in this thread) seems to be: if the opponent is random a stochastic strategy is best. This is simply not true. No - my argument is that if an opponent is random a minimax strategy is generally not best. >Stochastic strategies like MC search have an advantage over game-tree search when the game's branching factor limits you to considering just a few ply ahead (e.g. as in Go). Yes, that's also true. |
There are almost no opportunities to make a 'great' move - since a great move with a significantly better effect on chance of winning than a move which doesn't change anything at all - is possible only if the move fixes a huge earlier mistake that you wouldn't have made in the first place.
For this particular game, advantages are temporary and disadvantages are near-permanent - so it makes sense to play very defensively, which minmax does. Imagine a game of Die or Coin, where if you choose coin, then you get $10 for heads, and $20 for tails; and for dice you throw a hundred-side die and get $25 for values 2-100 but if you roll 1, then you get shot and die.
[edit] what I'm saying is that assuming that [a] all payoffs are either effectively 0 or -infinity; and [b] most moves will (in the near expected future) be either 0 chance of the bad event or >0 chance of the bad event; then minmax would generate equal results to MC search - however, MC search would fail badly if you put overly optimistic payoffs, i.e., give too large rewards for 'good moves' and too little penalties for bad moves; and this is hard to estimate.
Minmax works if your payoff scale is completely wrong by orders of magnitude as long as the preference ordering is correct, MC search doesn't. If you know that position A is better than B but don't know if it is 1.1 times better or a million times better - then you can't implement a good MC search but can implemnt minmax.