| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jasisz 3935 days ago
	This algorithm (in it's regular form used often in games and examples) has one interesting "downside" I was exploring some time ago - selection is performed using the UCB formula. So basically it tries to maximize the player payout. But in the most games this is in fact impractical assumption, because we end up tending to expand branches that will be most likely not chosen by our opponent. As in the example (I assume gray moves are "our" moves) - we will much more likely choose to expand 5/6 branch instead of the 2/4, that will be in fact more likely chosen by our opponent.

1 comments

panic 3935 days ago

In a game state where your opponent will be choosing the next move, you should select the next state to expand based on a UCB formula involving your opponent's expected payout, not your own.

link

jasisz 3935 days ago

But this requires storing and back propagating this info for the other players - something I really haven't seen in any examples (nor in this article). We cannot also assume that game is always zero-sum game and this information is not needed.

link