DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

Y	Hacker News new \| ask \| show \| jobs

	DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker (arxiv.org)
	102 points by maurycy 3444 days ago

8 comments

MikeTV 3444 days ago

> DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold'em

Whether any others have been made before now is anyone's guess. Botting is a known problem in online poker. If there's a golden goose out there, I'm sure it's being kept under wraps.

link

zitterbewegung 3444 days ago

You can collude multiple bots or perform other tasks which could make the botting problem in Texas Holdem not equivalent to the same achievement that they present in the paper.

link

femto113 3444 days ago

I believe the primary botting "problem" is not rule breaking activity like collusion but the farming of lower-skill players at lower limits than a professional would be willing to play at. A bot will happily rake in a 1x big-blind/hour advantage that a comparably skilled human would consider a complete waste of time. It's my understanding that the real state of the art here is not in the play algorithms (existing bots are more than good enough to beat weaker players) but in avoiding detection by both human and automated monitors.

link

bcassedy 3444 days ago

Correct. When I last played for income in 2012, the site I played on had 1-2 bots at virtually every table from the $10 buyin cash games all the way up to the $200 buyin games. Around the time I stopped playing it came out that there was a botting ring that had been winning at $1000 buyin games for some time.

Most of their income does come from the weaker players at the table, but many of these bots were good enough to breakeven or do slightly better against the pros at the table too.

link

valarauca1 3444 days ago

Most these bots are likely using standard well established techniques.

Strategic play, card counting, etc. A handful of heuristics make you an above average player and will get you thrown out of a casino. They'd be trivial to program.

I guess all that is novel here is a bot learned these techniques on its own.

link

LewisJEllis 3444 days ago

With all due respect, do you have any idea what you're talking about? This is poker, not blackjack. Card counting is useless and there's no strategy that will get you thrown out of a casino because it's not the casino's money you're playing for.

I'm not sure what sort of "standard well-established techniques" that "would be trivial to program" you're talking about. Optimal play in no-limit hold'em is a tremendously complicated mixed strategy with a massive decision tree. To make things even more interesting, in almost any given situation there will be multiple "correct" plays that, to avoid being exploitable, should each be made some percentage of the time at random.

This AI is novel because it achieved a result that (as far as anybody knows) has never been achieved before, not because it figured out how to do something on its own.

link

pdog 3444 days ago

The big games are typically heads up to avoid the collusion problem.

link

gallerdude 3443 days ago

I always think the same thing about neural nets on the stock market.

link

osti 3444 days ago

To be fair, none of the so called pros are considered big names in today's no limit heads-up games. They should probably challenge ppl like WCGRider, Jungleman etc. next.

On another point, CMU just can't seem to catch a break, their thunder continuously being stolen by UofAlberta in poker research, first in limit, now no limit. UofA clearly tried to publish this before the CMU poker challenge that's supposed to begin soon.

To read more about the CMU challenge http://www.cmu.edu/news/stories/archives/2017/january/poker-...

link

grizzles 3444 days ago

Doug Polk is WCGRider.

link

dsp1234 3444 days ago

Doug Polk is not one of the professionals who was part of this study. The list is in Table 1 of the paper.

link

osti 3444 days ago

I'm aware lol. Im a fan of his videos.

link

natecarroll 3443 days ago

The players they recruited were incentivized by a $8,000 prize pool up for grabs among the 34 of them...average $EV $235. They have to play 3000 hands to get a shot at that money, which is probably around 10 hours of multitabling. So that's ~$24/hr in expectation.

And then of course you don't get anything unless you're one of the top three winners against the bot, so there's likely nothing to be gained from grinding out a marginal victory. You should just go ahead and play kinda stupid/aggro and hope you win some of the big flips and whatnot. There's literally nothing at stake for you except time value, so you might as well flame out early and then quit or run up a big stake to give yourself a shot at top 3.

Basically, the study design ensures the bot faces off against weak players playing in a way that would be sub-optimal in any other situation. Not surprised the bot won by a decent margin, nor that they are trying to spin this real hard in advance of the CMU poker bot matchup next week, which will be much more rigorous.

link

lawn 3444 days ago

I think I can wrap my head around neural nets being superior at games with perfect information like chess or go. But how would you teach bluffing and randomness to a neural net?

link

bcassedy 3444 days ago

Former poker pro here.

Top professionals are building their strategy around game theory. They'll attempt to play in such a way that they aren't exploitable and look to deviate when they've spotted a weakness in their opponent's play.

Basically, the game theory optimal strategy is unexploitable. In every situation, the best you can do is break even by also playing the optimal strategy. If you deviate from optimal strategy, the optimal strategy will beat you, but it's possible that a strategy tailored to taking advantage of your specific deviations would beat you more quickly.

Unexploitable play typically means that you bet a size with a range of holdings that would make your opponent indifferent to all of his options (And the converse is true when facing a bet). For humans, this means that they gravitate to a few standard bet sizes, while a computer could, in theory, balance their range with much more granularity.

Last I read, for training the neural net it'll play billions+ hands against versions of itself designed to exploit various weaknesses. It'll start out by performing random actions, for example, say it'll have a 33% chance to call your bet, raise, or fold. It then starts to see that it does better when it raises your bet with the nuts and also as a bluff. Eventually, it arrives at an equilibrium strategy.

Since computers are much better at randomness than humans are, they're able to more effectively play these types of strategies and with more complexity of bet sizing. There is what's called a mixed strategy, a strategy where given a situation with the same hole cards you will call, raise, or fold to a bet with some non-zero probability. Doing that as a human is very difficult, but it's something computers manage to do quite easily.

link

IgorPartola 3444 days ago

Since you are here, I have a few questions about your former job.

First, how does one become a pro poker player?

Second, does it work like a sport where you get paid from sponsorships, or do you just directly take home what you win? Or a combination of both?

Third, is this something that you can do part-time, or does it require full time attention?

Fourth, why did you quit?

link

bcassedy 3444 days ago

Sure happy to answer questions.

Note that professional in this context just means made a living through poker. There are a ton of people like myself that earned a good living playing online or in casinos. A small minority received long-term sponsorships.

1. I started playing in high school and college. Like everything else I do I dove in to get better and eventually I was good enough that I was making good money playing online. At that point I started to pursue things full time.

2. For me it was the latter. The very top players in skill and visibility are typically the ones getting sponsorships. Win a big tournament, win at the highest level of online games, or make a tv appearance at a final table and you'll find opportunities for sponsorships. For most players, these sponsorships really just add stability to their income with the bulk still coming from winnings. Though there are the poker personalities who make the bulk of their income from sponsorships and TV deals.

3. You can definitely do it part time at some levels, but to keep up it does require quite a bit of study time to win at meaningful stakes.

4. When the US legal landscape changed in 2011, getting your money out of the sites that were still willing to serve Americans got more challenging. Moving out of the country wasn't an option for me. With the game getting harder all the time due to the proliferation of good strategy material and botting, it seemed like it was time to move on.

link

IgorPartola 3444 days ago

Thanks! One more question for you and @jat850:

So with the online games, etc. is the point just that you are better than the next guy to join the game so you just end up winning against players that are not as good? How reliable is something like that? It's true that a sucker is born every minute, but is it really feasible to make a living off it vs just staying income-neutral (e.g. win some, lose some, not really make any profit)?

link

bcassedy 3444 days ago

It's fairly reliable. Due to the random nature of poker, it's very easy to delude yourself into thinking that you're a better player than you are. This is true for hobbyists, gamblers, and professionals alike. Professional players may move up in stakes or even stay at the same stakes and think that they're stronger than the competition and write off their bad performance as a run of bad luck. Runs of bad luck (and good) can significantly skew your results even over a hundred thousand hands.

Because of this, players of all skill levels will play a long time in games where they are not favored. This is part of why it's so important to keep studying both your play and that of your opponents. If you rest on your laurels, those that do study will surpass you and you'll find yourself playing at a disadvantage without realizing it.

Online you're really able to make a good amount off even a small edge, as measured in big blinds per 100 hands, because hands are dealt so quickly and you can play more than one table at a time. For most of my career I was playing ~1000 hands/hour. In reality due to the rake, the cut the site takes from each pot, you need to have a substantial edge to make money, but this isn't as hard to achieve as you might think.

link

smnplk 3444 days ago

A hobby poker player here.

The main goal of any poker game is to win money. You win money from your opponents, you don't play agains the house. Although , the house (poker site) does take a small percentage of every pot you win, this is called rake. Do note, that in some games, rake can be so high that it is not profitable to play the game even if you are much better than your opponents. The only way to win money on the long run in poker is to be better than your opponents, so you need to look for bad opponents, it is a well know strategy called table selection or "bum hunting" :)

Nowdays it is still very feasible to make a living off poker, but you will have better chance finding bad player in live games.

link

jat850 3444 days ago

Not who you're replying to, but I can give a few answers, having also played professionally for some time.

1. Lots of means this can happen. "Professional poker player" is generally taken to mean "derives primary source of income from playing poker" or sometimes "spends the majority of their time playing poker". There's no exact qualification.

2. Not generally sponsorship driven, although there are some modes of sponsorship that do factor in in some ways - the primary means of income is basically by winning money from other players. A secondary component is often rakeback, in online play. In cash games, you can join and leave at any time and your winnings or losses are simply the amount you are up or down in that particular session. In tourmanent play there is a payout structure based on your placement in the tournament (often something like 25% of the total prize pool to 1st, 15% to second, etc.)

3. It can be done part time. There is nothing to say that your bankroll can't be seeded or supplemented by external means, and you can play cash games for short or long periods. Tournaments are typically long (at least if you remain in them a long time, though you can be eliminated at any time, basically). I used to play approximately 40 hours a week (since I was treating it like a job), but now I play 10-15 and it represents about 30-40% of my yearly income.

4. I personally quit because I found the stress associated with having it as the sole means of providing myself too overwhelming. One can have a stretch of negative earnings that can last hours, days, weeks, even months - and it can be psychologically damaging in some ways. I also found that I preferred to keep it as a hobby than a means of income, since I enjoyed it more that way.

link

IgorPartola 3444 days ago

Thanks! Replied to a sibling comment to yours with one more question.

link

tgb 3444 days ago

Is it known that there is an optimal strategy?

link

osti 3444 days ago

In heads up ie. 1 on 1, there is Nash equilibrium, but in multiplayer game there isn't because the other players can coolude against you.

link

bcassedy 3444 days ago

We know that there exists an optimal strategy, but that we still aren't close to achieving it. It's a zero sum game where both parties have the same lack of information and the betting order rotates. I think it has to have an optimal strategy.

link

jspiral 3444 days ago

At higher levels poker is about game theory, for example, the player bluffs at an optimal frequency in a certain situation so as to be indifferent to whether the opponent calls or folds.

Exploitative strategies, based on understanding opponent weaknesses and tendencies will win $ at a higher rate, but are themselves exploitable.

For example, almost never bluffing and playing only strong cards crushes beginners who play too many hands and call too much.

This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

A "perfect" game theory strategy is like armor, slowly bleeding the opponent every time they deviate from perfection themselves.

not sure if that helps but maybe some seeds to google at least

link

jdmichal 3444 days ago

> This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

To dig deeper:

Or you can try actively punishing the big hands by folding out early. Of course, that strategy opens you up to being bled by your opponent bluffing strong hands. Attempting to actively punish the big hands here is a deviation. This is what jspiral means by "deviations from perfection".

link

falcolas 3444 days ago

I imagine it's mostly just playing the percentages. Bet when it has a high percentage of winning, fold when it doesn't. It doesn't need to read its opponents if it can play the percentages perfectly.

link

ska 3444 days ago

No, this is a terrible idea. If you play like this consistently, you are basically telling your opponents when to play against you and when to get out of the way. They can even pick bet sizes (assuming no limit) to refine your possible hands very accurately.

link

jdmichal 3444 days ago

I'm usually surprised in tournaments by the number of people willing to play with me after I've sat quietly folding for the first three rounds... By all means, everyone should fold and give me the blinds, but there always seems to be a player or two who bite!

link

splike 3444 days ago

This is a very naive interpretation of the game of poker. Professional human players are already very good at calculating the percentages, and any joe playing from his computer has access to a calculator.

The reason why simply playing the numbers fails is that if I know an opponent is playing this way, I'll just fold every time he decides to play.

link

reverend_gonzo 3444 days ago

This is true for Limit Hold'em, but very much not true for No Limit. Limit Hold'em is a solved game, because as long as you're playing the odds, you can play perfectly. No Limit changes things because the bets can vary wildly. If you play a tight game (just play the odds), and opponent will get out whenever you're in, and will bluff just to see if you call or fold.

Bluffing is a major component in No Limit, and there are very different profitable playing strategies.

link

jdmichal 3444 days ago

I fell to this when I randomly decided to play some limit hold'em one night. Kept losing to a guy that chased every chance at odds he could, because I couldn't make bets big enough to scare him out. Lesson learned!

link

osti 3444 days ago

That's not true at all for limit, the perfect bot's bluff frequency is probably higher than most humans. Play against it yourself here http://poker.srv.ualberta.ca

link

tomarr 3444 days ago

I don't think this is true? At least for small blinds. If you know your opponent is playing the percentages you could heighten your threshold.

link

ChuckMcM 3444 days ago

I love it, research that pays for itself :-) I think of poker and other card games as imperfect but predictable information. So while you don't know what cards the other players have you can certainly estimate the likelyhood of what they have and prune your choices that way. Think single deck card counting in Blackjack.

link

esseti 3444 days ago

the fact that they used hearts and spades instead of number for affilition is just lovely.

link

philosopheer 3444 days ago

most people (including here on HN) are complete n00bs when it comes to understanding how poker is played and how computers can play it, so just to straighten y'all out at the git-go here:

computers are better at bluffing and randomness than humans are. Bluffing is an important optimizing strategy in playing poker well, and it entails tracking the expected value of a pot (which includes cost expectations, don't forget) and it entails randomness, necessary to obfuscate patterns of betting that could give away evidence of your bluffing strategy. Like chess and go, we may not be "there" yet with computers, but n00bs need to understand the theory.

What computers can't do is read "tells", so if you are a master poker player via tells (whether it's unconscious or conscious thinking on your part) then you will beat other humans better than a computer will; but, by the same token, the computer will not give you tells to read nor be fooled by your fake tells. I think the mistake in thinking newbies (even highly experienced ones) make is mixing together "the psychology" of the game with the mathematics of the game.

So to give an oversimplified concrete example of a poker bluffing strategy (inspired by Nesmith Ankeny's book), if odds of you drawing one of the cards you need to win a showdown are 1 out of 4 but the expected payoff is 20x then you not only need to stay in purely on expected value, but it is also an optimal time to bluff if you don't get your card. It is informationally better to have a bluffing strategy that masquerades as an "I have good cards" strategy and gives random information after the showdown rather than "bluffing" being something you do sheerly when you have shit cards. And to enforce a random strategy on yourself, he recommends using a system of the cards in your hand as the random number generator to tell you whether to bluff or not: as you can see, his strategy designed for human players is more perfectly implemented by a computer.

link

feral 3444 days ago

No - If the only thing computers couldn't beat humans at was reading tells, they'd win online poker.

But they don't yet do that: this paper is about beating humans at heads up, which is a much more limited domain than a full table.

If you want to learn about why to bluff I'd recommend reading about using game theory to solve Kuhn poker.

link

philosopheer 3444 days ago

online poker has the tremendous flaw that collusion between players is the most optimum strategy, and there is just noooo way to stop it. Collaborating poker-bots who outsource their peppy poker chatter to Bangalore (your feedback is important to them!) will soon be running all the tables if they aren't already. No, they'll never be the champions, because that suboptimal strategy would lead to discovery, but as a giant grist milling farm grinding out profit, seems irresistable.

I did a quick google review of Kuhn poker and I don't see how any of that would not benefit from the understanding I was attempting to convey in my initial post.

link

6nf 3443 days ago

Collusion is avoided entirely by playing heads up only.

link

jpolitz 3444 days ago

Does this imply that a pro may well do better in a multiplayer game with mixed humans and machines (by using "tells" to build up a bigger stack from the humans' inaccuracies), than in heads up against a machine?

link

jdmichal 3444 days ago

Players can and will target others as "easier" and selectively get in fights with them. If nothing else, you'll certainly avoid getting into fights with a player that continuously beats you.

link

geofft 3444 days ago

As an actual complete n00b, in online poker, how are tells communicated when you can't see someone's facial expressions or body language? My guess is the dollar value of bets, and the timing side channel?

link

jdmichal 3444 days ago

Bet amount certainly plays a big part. This is what philosopheer means when he talks about "the mathematics of the game". Bet amount is part of that mathematical part; it's basically a signal of your confidence against the current size of the pot. This is why, when you do reading, many strategies will talk about bet amounts as multipliers of the current pot.

Timing can be informative, but it's actually weaker online than in person. In person, you know whether the person is physically present, and can generally gauge when they're paying attention also. Online, taking a long time could simply mean that they're not paying attention. (I've watched streams on Twitch of pro players working multiple tables online.)

link

brador 3444 days ago

Heads-up is solvable by just crunching the known probabilities, so i'm not sure what the achievement is here. Maybe the complexity of work involved to build the program is worthy of merit? Not sure.

link