| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rck 2510 days ago

Here's a tutorial on counterfactual regret minimization that does what you're looking for:

I started converting the Java to Python a while back, but got distracted by other work:

That will learn optimal play against a fixed opponent. If you change the code to make both players use regret minimization, it should still converge.