|
|
|
|
|
by hervature
1441 days ago
|
|
This is an interesting avenue for future research. The reason why it is not as straightforward as you claim is because all inference is going to depend on your perception of their policy. That's why the Nash equilibrium is sought after first. Because you should assume your opponent is perfect until you start observing their suboptimal behavior that you can exploit. Additionally, you would also have to handle the meta part where the exploiting portion of the algorithm isn't itself being exploited by the opponent. Somehow, you should deviate slowly from the Nash equilibrium but revert quickly if the opponent is abusing your new strategy. |
|