| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by noelwelsh 3423 days ago
	Since you seem to know what I'm talking about, is the innovation here i) a new objective function (counterfactual regret) and ii) a method to optimise that objective (regret matching)? I'm familiar with bandit algorithms and reinforcement learning, and on a very quick skim could not work out the exact difference here.