Hacker News new | ask | show | jobs
“Rock Paper Scissors” Trained in Browser AI/ML (heartbeat.fritz.ai)
38 points by GantMan 2508 days ago
2 comments

FYI, the author trained a model to recognize what symbol your hand is making, not a model for ideal play.

Identifying the hands is pretty slick, but I'd actually find a model of ideal play to be even slicker, because there is no "optimal strategy" for RPS, but there is an "optimal strategy" against each individual opponent. Making an AI that can learn that strategy would be pretty impressive.

Yeeaars ago I remember reading one optimal play against people you don't know, that's supposed to give a slight advantage over multiple rounds: Repeat the move your opponent used in the prior round.

IIRC, the idea was that people tend not to repeat the same move, and then of the other two, it's slightly more likely they'd pick the one on their mind already - the one they were hoping their opponent would've chosen in the prior round.

Humans suck at being random, basically.
Interesting links for studying RPS AI:

* http://www.rpscontest.com/ * https://daniel.lawrence.lu/programming/rps/

Playing rock paper scissors is about predicting what the opponent will play next given a history of moves.

Prediction is the dual problem of compression. For example if the history of moves is "RRR", you can try to compress the strings "RRRR", "RRRP", and "RRRS". Clearly the first one ("four Rs") is more compressible than the other two, so it is most likely that the next move in the sequence is also an R.

However, there is no universal compression [1][2] so there is indeed no universal optimal RPS player.

[1] https://en.wikipedia.org/wiki/Kolmogorov_complexity [2] http://mattmahoney.net/dc/dce.html

Fwiw, I made a simple model[0] when trying tf.js a year ago which could consistently beat me and friends if you play enough against it.

Wish I had kept it training with the data from people who used it but it's on a static github pages site.

0. https://svilentodorov.xyz/rps

That's neat, though easily beatable by doing not human-like RPS strats. (Rock x 3, Paper x 3).
Here's a tutorial on counterfactual regret minimization that does what you're looking for:

http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

I started converting the Java to Python a while back, but got distracted by other work:

https://github.com/RichardKelley/cfr/blob/master/rps.py

That will learn optimal play against a fixed opponent. If you change the code to make both players use regret minimization, it should still converge.

I believe it was The New York Times or New Yorker that hosted one a few years ago.

I think it considered all players the same opponent and used something like a markov model for predicting moves.

If you took it one game at a time it was really hard to beat. Apparently we all kinda feel now is the right time to switch to scissor. My only strategy for not loosing was to decide on the next 5 moves as randomly as possible and stick with it, as it would choose it’s moved based on wins and losses and I wouldn’t, making me less predictable.

> because there is no "optimal strategy" for RPS

there's a nash equilibrium? does that not count?

Take a look at the winning strategies from this RPS contest: https://webdocs.cs.ualberta.ca/~darse/rsbpc.html
Humans tend to fall into patterns that are easily described by n-gram frequency. A pretty good RPS bot is a typical second/third year university project.
Can approach RPS (Rock, Paper, Scissors) strategy as a relatively simple case of classic von Neumann-Morgenstern two person game theory. That theory follows from the classic duality result in linear programming.

The main result for players Red and Blue for RPS is: Red plays each of Rock, Paper, Scissors with probability 1/3rd and independent of everything else in the known universe. Can do this by using, say, an ordinary die with six sides. Then just from the strong law of large numbers, in the long run Red and Blue will both break even, no matter what Blue did, does, or will do.

To win, Red needs some means of predicting Blue's play -- no good predictions, no winning. I.e., for Red to win, Blue has to be predictable in some sense.