Hacker News new | ask | show | jobs
by VodkaHaze 3422 days ago
CFR is based on the method of "regret matching", which is a policy update method. Roughly, it says every iteration, the probability you take an action is equal to the fraction of its cumulative counterfactual regret over total cumulative counterfactual regret for all actions.

The cool thing about it is that it provably converges to nash equilibrium in zero sum game, and to correlated equilibria in non-zero sum games.