| HN Mirror

Not quite: it really does only go to Nash in a 2p zero sum game.

In a multiplayer zero-sum game, there's no theoretical proof that it should go to Nash. In the tiny 3p game of 3p Kuhn poker (3 players, 4 card deck, enough chips for one bet) we found that it does go to Nash (here's the paper: https://webdocs.cs.ualberta.ca/~games/poker/publications/AAM...). But in a slightly larger toy game, 3p Leduc poker, we empirically found that it does not. Like you suggested, we did this with best response calculations. If you take the strategy profile CFR produces and then compute a best response in each position, those best responses each improve over the strategy profile. If you plot that loss over time while CFR is running, it becomes clear that it does not converge to Nash: it eventually bottoms out and does not improve further.

However - in practice, the strategies produced by CFR for multiplayer games still appear to be quite strong, even if not Nash. This was the approach used by the University of Alberta for years in the Annual Computer Poker Competition, and it crushed the competition there. In 3p Limit Hold'em, the CFR strategies also held up well against human pros in informal testing by the U of A.

Nash equilibrium isn't a compelling goal in a multiplayer game anyways. In a 2p game it theoretically guarantees that you earn at least the game value (i.e., do no worse than tie if you rotate seats every game). In a multiplayer game, even if you could compute one, there's no such theoretical guarantee, when all other players at the table are independently choosing strategies. Even if everyone at the table uses an independently computed Nash equilibrium, that set of strategies is not necessarily itself a Nash equilibrium (in 2p zero sum games, it is).

To summarize - for multiplayer, it doesn't necessarily go to Nash. But you might not care if it does or not, and it works great in practice even though it doesn't go to Nash.

In a non-zero-sum game, you're right that CFR converges to a correlated equilibrium. But we can also theoretically bound how close it gets to a Nash for the non-zero-sum game, and empirically measure how close it gets by using best response analysis. The answer is - if it's only slightly non-zeo-sum (e.g., subtracting the rake in a poker game for negative sum, or pretending to add money to the pot to encourage aggressive play) then CFR still gets very close to Nash.