Hacker News new | ask | show | jobs
by rck 2510 days ago
Here's a tutorial on counterfactual regret minimization that does what you're looking for:

http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

I started converting the Java to Python a while back, but got distracted by other work:

https://github.com/RichardKelley/cfr/blob/master/rps.py

That will learn optimal play against a fixed opponent. If you change the code to make both players use regret minimization, it should still converge.