Hacker News new | ask | show | jobs
by noambrown 2534 days ago
The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.
2 comments

It does bluff, but that’s not my point. My issue is that it bluffs without consideration of its opponent. High level strategic play of most games is about adapting to your opponents play. This bot does not do that. It is secretly a giant lookup table of game state to response.

In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.

If you think about it, any AI that's stopped learning and is now efficiently doing pattern matching or pattern completion (assuming memory and attractor states), instead of running a complex search, is arguably a fancy lookup table hashed by similarity. This includes humans. In other words, lookup table isn't the slight most think it is. But the bot does do real time search so it's not "merely doing" a look-up.

Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.

It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.

While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.

I don’t mean look up table as a bad thing. I mean it’s a lookup table on game state, without incorporating any information about the players. But good points
> High level strategic play of most games is about adapting to your opponents play.

Is this true in any meaningful sense?

For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.

Adaptability is certainly not necessary (almost by definition) if you're playing a near to equilibrium strategy but adaptability is a useful skill to have in a general non-stationary world.

That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.

> Pluribus must infer its opponents strategy profile

This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.

You're right that it uses itself as a prototype for decisions but the fact that it also maintains a probability distribution over possible hole cards and that it updates according to observed actions is already richer than the local decision only approach taking most all other bots. This is sort of forced by the simplicity of poker's action space combined with the large search space and imperfect information. Here, the simplicity ends up making things more difficult! They also use multiple play styles as "continuation strategies" so it's a bit more robust. And to be fair, I suspect much of human play does use themselves and experience as a substitute too.
> For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.

Well in online poker high level players make great use of player tagging, taking notes about players they have played before and what they've done in important hands or their patterns. Software exists to track how opponents behave in any given situation, and if it pops up again you use that.

I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.

I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)

Player tagging just makes you exploitable. I play one way now, you tag me "Haha, fool bet-folds way too much" and then I change it up to exploit you, "Huh, I keep trying to fold him out with worse and he doesn't bite even though my notes say he will".

The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.

For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.

You would be a fool to act based off only 2 instances of seeing a particular behaviour. That's why you have to weigh up how many instances you've seen. Sometimes if it's less than X instances it's not worth considering that particular statistic as valid.

Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.

The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.

For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.

I think we need to make a distinction between two kinds/styles of play:

1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.

2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.

You most often see 1 online, and 2 live, for obvious reasons.

A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:

1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.

2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.

3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.

Note that 1) is only true if your opponent is also not making many mistakes. Which fails to be true for most humans, where the combination of randomization and calculating state appropriate ranges is very difficult. This means that weak players can still lose heavily from mistakes/poor play within a reasonable number of hands, it need not be slim.

Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.

Player tagging is more complicated than a single game, and goes far deeper than playing a few hands one way and then switching it up. You can have player stats based on thousands of hands, you can know things about your opponent even they don't know.

I don't think you play very much, which is fine, but makes this discussion a bit pointless.

> In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

Adaptability is beaten by perfect strategic play in games with clear victory conditions.

My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.

http://www.finnkydland.com/papers/Rules%20Rather%20than%20Di...

> Rules Rather than Discretion: The Inconsistency of Optimal Plans

> Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.

"Strategic" is probably the wrong word, but I think there is a valid question here regarding the approach the AI is taking. One of the key things for a good poker player is having the ability to adapt and adjust their strategy depending on how others at the table are playing. Sometimes you can have the exact same cards in the exact same position and in one game it is smart to fold and in another game it is smart to raise. From the description in the article, it doesn't appear that this AI takes those ebbs and flows into consideration. Instead it seems to play "purely mathematically optimally on expected value" that was honed through trillions of simulations.

There is a cliche about how poker is about playing your opponents and not the cards. Is this AI is only focusing on its cards and ignoring its opponents?

The AI doesn't adapt to the opponents, and that's still an interesting challenge for AI research. That said, at the end of the day, it was making quite a bit of money playing against elite human pros. I think that suggests the cliche is, at least in part, wrong.
Making "quite a bit of money" still leaves open the possibility that the AI is leaving a lot of money on the table by not taking opponents into consideration.

Also I would be curious to see how it performs against people that aren't "elite human pros". Would this AI win at a higher rate in a game against average recreational players compared to the rate a pro would win?

Lastly it is also possible that the pros simply didn't have enough time to adapt to the AI which would be extra important considering the AI plays unlike humans and therefore is harder to predict.

I think the bot would make a lot of money playing against average recreational players, but it's absolutely true that if you can exploit bad players' weaknesses, then you can make more money than what the bot would earn.

We played 10,000 hands over 12 days in the 5 humans + 1 AI experiment. That's quite a long time, and there's no indication that they even began to uncover any weaknesses in that time period. So I'm fairly confident the AI is robust to exploitation, and I think that's a very important quality to have in any AI system.

That 10,000 total hands number isn't particularly meaningful on the point of adaptability because the humans aren't sharing information with each other. The important number is how many hands each individual human played against the AI. Another question would be whether the pros knew which player was the AI? Because if they didn't, you are basically throwing a modified Turing Test against the pros before they can even begin to try to find tendencies in the AI. Predicting opponents is a huge part of how people play poker. If the AI plays unlike any human, pros are at huge disadvantage against an AI compared to how they would fair against a similarly skilled but more traditional human player.

None of this is meant to diminish what you all accomplished, I'm just highlighting areas of poker in which this AI would be less successful than humans even if it is more successful overall.

The humans knew the whole time which player was the bot.
There was an interesting IRL poker game a few years ago. The player who was running behind started going all in on every hand without even looking at their hand (with a huge amount of success).

Out of curiosity, how does a bot deal with oddities things like this?

This is a solved problem. Open-shoving is a feature of sit-n-gos, so of course people have simulated these and compiled so called "pushbot tables". The parameters are basically pot size and winning probabilities against a random hand.

While this particular bot may not have those programmed in, a more powerful variant eventually will.

The mathematical theory explored in the paper is that if multiplayer poker isn't one of the multiplayer finite state games that pathologically fails to converge to a Nash equilibrium, then it has one, and this strategy should approximate it. Intuitions about adaptability and the advantages thereof aren't applicable in the scenario where the opponent is playing to a Nash equilibrium. You can perform equally well by participating in the other side of the Nash equilibrium, but anything else is a losing strategy. The fact that this approximation converges to a strategy that's actually really good suggests that there is a Nash equilibrium, and that the converged-upon strategy is converging on it.

You can't out-think or adapt to a rock-paper-scissors opponent who selects at random. All you can do is also select at random and accept that the two of you have even odds.