|
|
|
|
|
by noelwelsh
3423 days ago
|
|
Since you seem to know what I'm talking about, is the innovation here i) a new objective function (counterfactual regret) and ii) a method to optimise that objective (regret matching)? I'm familiar with bandit algorithms and reinforcement learning, and on a very quick skim could not work out the exact difference here. |
|