Hacker News new | ask | show | jobs
by VodkaHaze 3423 days ago
Not really, because CFR is not only taking regret as the loss function, but also the method of "regret matching". That is making the mixed strategy probabilities in the next iteration equal the cumulative counterfactual regret (which you keep track of while iterating).
1 comments

Since you seem to know what I'm talking about, is the innovation here i) a new objective function (counterfactual regret) and ii) a method to optimise that objective (regret matching)? I'm familiar with bandit algorithms and reinforcement learning, and on a very quick skim could not work out the exact difference here.