| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yshui 779 days ago

I think this is doable. Say we assign a win rate W(S) to each board state S, and let W(S, A) denote the win rate after taking action A from state S. Since the transition is probabilistic, we can write:

W(S, A) = P(good) * (1 - W(S_good)) + P(bad) * (1 - W(S_bad)) + (1 - P(good) - P(bad)) * (1 - W(S))

And obvisouly:

W(S) = max(W(S, A), foreach A in Actions)

max() is troublesome, but we can replace it with a >= sign:

W(S) >= (W(S, A), forall A in Actions)

And then we can expand W(S, A) and move W(S) in each of the inequalities to the left hand side. After all that we will have a bunch of linear inequalities which can be optimized with linear programming. I think the objective would just be:

<del>maximize</del> minimize W(empty_board)

2 comments

Labo333 778 days ago

There is an easier way to solve each recursion! I just wrote a blog post on it: https://louisabraham.github.io/articles/probabilistic-tic-ta...

link

yshui 777 days ago

Yep, this is what I ended up doing as well! With how the game generate boards, the player that goes first always have a ~5% advantage. Since players switch hands each around they should have 50% win rate if both play optimally.

In practice, playing against author's AI I barely get ~60% win rate (small caveat, I count ties as 0.5 to both players). What about yours?

Edit: nvm I saw you did the same with ties.

link

orlp 778 days ago

I think you have an error in the equation defining V(s).

You have component n_c * V(s) for the 'nothing happened' case, but I don't think that's correct. If you rolled that nothing happens the turn still passes to your opponent, so I think it should be n_c * V'(s).

link

Labo333 778 days ago

oh RIGHT. Gotta fix it

link

yshui 778 days ago

Turns out linear programming is not fast... Takes about 90 minutes to find the optimal solution for any board configuration.

link