| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by VodkaHaze 3423 days ago
	Not really, because CFR is not only taking regret as the loss function, but also the method of "regret matching". That is making the mixed strategy probabilities in the next iteration equal the cumulative counterfactual regret (which you keep track of while iterating).

1 comments

noelwelsh 3423 days ago

Since you seem to know what I'm talking about, is the innovation here i) a new objective function (counterfactual regret) and ii) a method to optimise that objective (regret matching)? I'm familiar with bandit algorithms and reinforcement learning, and on a very quick skim could not work out the exact difference here.

link