|
|
|
|
|
by yshui
731 days ago
|
|
I think this is doable. Say we assign a win rate W(S) to each board state S, and let W(S, A) denote the win rate after taking action A from state S. Since the transition is probabilistic, we can write: W(S, A) = P(good) * (1 - W(S_good)) + P(bad) * (1 - W(S_bad)) + (1 - P(good) - P(bad)) * (1 - W(S)) And obvisouly: W(S) = max(W(S, A), foreach A in Actions) max() is troublesome, but we can replace it with a >= sign: W(S) >= (W(S, A), forall A in Actions) And then we can expand W(S, A) and move W(S) in each of the inequalities to the left hand side. After all that we will have a bunch of linear inequalities which can be optimized with linear programming. I think the objective would just be: <del>maximize</del> minimize W(empty_board) |
|