Hacker News new | ask | show | jobs
by noelwelsh 3423 days ago
Since you seem to know what I'm talking about, is the innovation here i) a new objective function (counterfactual regret) and ii) a method to optimise that objective (regret matching)? I'm familiar with bandit algorithms and reinforcement learning, and on a very quick skim could not work out the exact difference here.