|
|
|
|
|
by TemplateRex
1436 days ago
|
|
But their NN already outputs a policy conditional on public and private info! Why not have a separate intermediate branch in the NN that is fed with the current estimate of private info (for both players) and outputs the policies (again for both players) given those info estimates? Wouldn't it be possible to learn from that? |
|
Second, what you are suggesting is probably best done as a secondary process outside of learning the Nash equilibrium. If you knew an opponent's policy, you would need to recalculate your optimal counterplay for that specific policy. This is completely orthogonal to the goal of this paper which is to learn the Nash equilibrium through self-play alone.