Hacker News new | ask | show | jobs
by noambrown 2388 days ago
Hi! I'm one of the authors on the paper. We'd be happy to answer any questions. Ask us anything!
5 comments

Hey Noam, this is some great work; I'll need to sit down and give the paper a deeper read. Also, the visualizations on this blog post are incredible.

I saw a talk on the Libratus agent a while back, and one of the most interesting takeaways was that the behavior of the bot had already started to impact the professional players, who now spontaneously bet large amounts to force other players out of a hand. Were there any behaviors your agent demonstrated that surprised you in the same way? What insights might we draw from this cooperative AI system that may have more general applicability to other planning domains?

In terms of Hanabi, this bot arrived at conventions that are pretty different from how humans play the game. We invited an advanced Hanabi player to play with the bot and he pointed out a few things in particular that he'd like to start using. For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I think one important general lesson is that search is really, really important. Deep RL algorithms are making huge advancements, but Deep RL alone can't reach superhuman performance in Go or poker with search. Here, too, we see that search was the key to conquering this game, and I think that will hold true in more complex real-world settings as well. Figuring out how to extend search to more complex real-world settings will be a challenge, but it's one worth pursuing.

> For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I've definitely seen advanced Hanabi players use a more subtle version of that rule: "If your hint looks like it's telling me to play my leftmost hinted card, how long has that card been playable? If it could have been hinted for play a long time ago, and it's just being hinted now, it must not be playable. So what else must you mean...?"

That version of the rule allows for more subtle cases. Suppose you hint that a player's second-from-the-left and fourth-from-the-left cards are both red. If there hasn't been an opportunity to hint the second-from-the-left since it became playable, go ahead and play the second-from-the-left. If there have been opportunities to hint second-from-the-left, play fourth-from-the-left.

That rule requires human players to model whether the other players' actions in the interim have been "urgent" things that needed taking care of before hinting them, or whether those other players would have hinted them sooner if their card was playable.

Isn't this not the same thing as AI that can beat humans at things like Bridge where the bidding game matters quite a lot? IIRC in Hanabi the fact that there is imperfect information does not really matter that much for strategy, where as in things like League of Legends or Bridge or many of those types of games it really does matter quite a lot.
Bridge has a similar challenge, though from what I understand Bridge AIs are not superhuman yet. I suspect our techniques could be applied to Bridge, though they may need to be adapted a bit.

The imperfect information in Hanabi absolutely matters a ton. It's not an interesting game without it.

Can you come back in a day or so and answer some questions? I, like others, need some time to digest it.
Definitely!
Damn, I haven’t gotten around to fully reading Pluribus yet, and now there’s more? Congrats on the results! What’s next?
Thanks! We're looking in a few different directions, but one thing I'm excited about is mixed cooperative/competitive settings. In poker, there is no room for cooperation. In Hanabi, you are 100% cooperating with your teammates. But most real-world situations, like a negotiation, are somewhere in between. The AI techniques for these settings are not too strong yet.
What's FB going to do with this?
Open source it, learn from it, and build upon it to continue to push forward the frontier of AI.
That would be nice. What is Facebook AIs take on ethical use of its research?

Facebook probably pays through the nose for AI research and probably wants a ROI. Facebook makes money by building better user models and spamming targetted ads. Some of them are getting scarily good.