|
|
|
|
|
by hervature
1437 days ago
|
|
First, the neural network is taking the history of observations into account. We don't know what the NN has learned, but the NN is probably making some inference on likelihood of opponent piece locations. They haven't explicitly coded it to do that but it is difficult to imagine a human-level AI not doing this. Second, what you are suggesting is probably best done as a secondary process outside of learning the Nash equilibrium. If you knew an opponent's policy, you would need to recalculate your optimal counterplay for that specific policy. This is completely orthogonal to the goal of this paper which is to learn the Nash equilibrium through self-play alone. |
|