| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lightvector 4515 days ago

I'm not too surprised that alpha-beta does well, because when you are in a "good" state, playing so as to ensure that even in the worst case, you will still remain in as good a state as possible is very similar to playing to minimize the probability of ending up in a bad state in the next few moves. Obviously, the latter is your actual goal rather than the former. But the former is often a decent approximation and has the benefit that if most of the time there actually does exist a way you can maintain a "good" state even under perfectly adversarial play, then alpha-beta pruning will let you search much deeper and faster to find that way.

However, I would not dismiss MC-search or expectimax. Whereas alpha-beta should perform well when the current position is in a "good" state, when the current position is in a "bad" or "tricky" state, it almost certainly will perform way worse than search methods that actually take into account the opponent being random.

To consider the extreme example, imagine the position is extremely tenuous and that alpha-beta has proven that if the tiles are placed in the worst possible way in the next several turns, it is guaranteed the position is a loss and a 2048 is impossible. Obviously, it would be better that point to switch to something like expectimax and play so as to minimize the probability of a loss and maximize the probability of recovering, which could still be quite large. Whereas with alpha-beta the only thing you can do is maximize the number of moves until the forced loss, which will often not be the same thing as maximizing the probability of recovery, particularly if most reasonable branches of play lead to a forced loss in the same number of moves.

Similar arguments apply when the position is not a forced loss yet but is still "bad". For example, if the "bad" position is such that your long-term win probability is only 30%, I would expect alpha-beta to pass-up opportunities to fix the position if those opportunities carried any risk of making the position worse, whereas taking those risks is clearly correct if they could fix the position into a probable win and the probability of making things worse was only 1/2 or 1/3.

On the relevant stackoverflow thread, there is another answer much further down that has received far less attention, which is a shame: http://stackoverflow.com/questions/22342854/what-is-the-opti...

I've downloaded and compiled the solver, which uses expectimax rather than alpha-beta, and uses heuristics that are arguably even simpler and dumber, and yet it seems to outperform the OP's solver. It gets 4096 with good consistency and even often gets an 8192, whereas the OP's solver (after a tweak to make it not stop at 2048) can sometimes require a few tries to get a 4096 and only rarely gets an 8192. Granted, there's a large computing power difference, since one runs as native compiled code and one runs as javascript in the browser, but it shows that expectimax, which one would expect to be the most theoretically-sound approach, is indeed at least a comparably good approach, and possibly a superior one.