Hacker News new | ask | show | jobs
by zone411 2447 days ago
I expected better from this site than multiple commenters spreading false info!

This is wrong. Making optimal decisions IS very complicated. Just a heads-up no-limit match between the two best players in the world is far from the Nash equalibrum (a game theory-optimal solution). In the 2017 Humans vs AI match, a bot destroyed very good humans by 14.7 big blinds per hand. This is without caring at all about what strategies others play. If you analyze other players and you find what mistakes they make, you can do better and it's even more complicated. And vs multiple players like here, it's more complicated again (optimal strategy can only be calculated assuming there is no collusion). A 6-person game was only finally beaten a few months ago! https://www.nature.com/articles/d41586-019-02156-9

Source: used to play poker (up to the main event WSOP and $5/10 level during the poker boom) and wrote a simple game-theory optimal solver for fun.

2 comments

As an addendum to this very good comment, that Nature article had a couple of big asterisks.

* Bet sizes were restricted. E.g. humans and the the bot could only bet fixed bet sizes, like 1/4 pot, 1/2 pot, full pot etc. Creative bet sizing is one of the skills that distinguishes top pros.

* Stack sizes were reset after every hand. E.g. every player in the hand was given the same amount of chips at the start of every hand. How you performed previously in the session thus did not matter. Anyone who has played poker knows that this is highly unrealistic. Larger stack sizes convey an ability to bully smaller ones, and stack sizes greatly affect what range of hands you can reasonably play.

The point being, even a supercomputer running the most efficient heuristic based poker decision making programs has not yet been able to beat humans in a game that resembles what a real 6 or 9 person table would reflect.

---

Just as reference, on a four-year-old quad-core/8-thread Intel i7-based desktop with 32GB of RAM, to solve a SINGLE hand in PioSolver (the most popular poker solver) from flop through the river takes my machine about 7 minutes. The game tree alone takes up 4 GB of RAM, and in this scenario there are only two players, and each player is restricted to 3 bet sizes.

The idea that this kind of computation can be done on a phone is ludicrous.

Hmmm I could be wrong but I believe it's not true that humans could only bet fixed sizes. Instead, the AI was only pretrained with fixed sizes and had to do some kind of live search algorithm for any size outside of those values, which could be what you're referring to.

Stack sizes were reset to keep the research minimally scoped, taking stack sizes into account likely does not require a quantum leap in research.

This is getting pretty off topic, but the computation could be done online.

Yeah, seems like you're correct here.

I went back and re-read the pre-print here (https://www.cs.cmu.edu/~noamb/papers/19-Science-Superhuman.p...). On page 2:

> To reduce the complexity of forming a strategy, Pluribus only considers a few different bet sizes at any given decision point. The exact number of bets it consid-ers varies between one and 14 depending on the situation. Although Pluribus can limit itself to only betting one of a few different sizes between $100 and $10,000, when actually play-ing no-limit poker, the opponents are not constrained to those few options. What happens if an opponent bets $150 while Pluribus has only been trained to consider bets of $100 or $200? Generally, Pluribus will rely on its search algorithm, described in a later section, to compute a response in real time to such “off-tree” actions.

Good catch, and thanks for the correction.

Regarding the effect of stack sizes, I'm not certain on this, but my intuition is that there is some effect on perceived ranges of the other 5 players at the table if stack sizes vary. Since Facebook AI will not be releasing Pluribus code or pre-trained models/weights, we can't be certain, but things like stack-to-pot (SPR) ratio would seem to matter.

Of course, you could always make the argument that human players in a cash game can re-up/refill to the maximum buy-in whenever they're short, but that's another discussion altogether.

Do you still have that GTO solver kicking around? How'd you build it?
I do! But it would take a few days to prepare it to make it a usable github repo. I'll try to get it sometime. It's written in C++.
If you don’t mind, it’d be awesome if you could email it to me (in profile). We’re both engineers here, I don’t need anything super polished. :)

Just interested to see your approach as I’m in the middle of writing my own my own.