Hacker News new | ask | show | jobs
by TemplateRex 1436 days ago
What I don't understand is why they don't try to make inferences about the opponent's private state. I get that the full Bayesian update is intractable, but some sort of RNN or LSTM should be able to produce pretty accurate estimates for the opponent's private info. And with self-play, you can train the deduction head of a NN by adding a KL-divergence between inferred and ex-post observed pieces. That would both make you guess better and also try and "jam" your opponent's inference by randomizing your own piece distribution.
2 comments

This is an interesting avenue for future research. The reason why it is not as straightforward as you claim is because all inference is going to depend on your perception of their policy. That's why the Nash equilibrium is sought after first. Because you should assume your opponent is perfect until you start observing their suboptimal behavior that you can exploit. Additionally, you would also have to handle the meta part where the exploiting portion of the algorithm isn't itself being exploited by the opponent. Somehow, you should deviate slowly from the Nash equilibrium but revert quickly if the opponent is abusing your new strategy.
But their NN already outputs a policy conditional on public and private info! Why not have a separate intermediate branch in the NN that is fed with the current estimate of private info (for both players) and outputs the policies (again for both players) given those info estimates? Wouldn't it be possible to learn from that?
First, the neural network is taking the history of observations into account. We don't know what the NN has learned, but the NN is probably making some inference on likelihood of opponent piece locations. They haven't explicitly coded it to do that but it is difficult to imagine a human-level AI not doing this.

Second, what you are suggesting is probably best done as a secondary process outside of learning the Nash equilibrium. If you knew an opponent's policy, you would need to recalculate your optimal counterplay for that specific policy. This is completely orthogonal to the goal of this paper which is to learn the Nash equilibrium through self-play alone.

Bayesian play is not necessarily optimal for imperfect information games. The reason is: You don't only need to play optimally with respect to the information you have observed, you also need to hide your own information and balance those two needs.

See the Deep Mind "Player of Games" paper from last year for an agent that takes a more game theoretic approach, which is probably needed for "simpler" games like Poker, that we can play to higher levels of accuracy: https://arxiv.org/pdf/2112.03178.pdf