http://modelai.gettysburg.edu/2013/cfr/cfr.pdf
I started converting the Java to Python a while back, but got distracted by other work:
https://github.com/RichardKelley/cfr/blob/master/rps.py
That will learn optimal play against a fixed opponent. If you change the code to make both players use regret minimization, it should still converge.