| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gjstein 2386 days ago
	Hey Noam, this is some great work; I'll need to sit down and give the paper a deeper read. Also, the visualizations on this blog post are incredible. I saw a talk on the Libratus agent a while back, and one of the most interesting takeaways was that the behavior of the bot had already started to impact the professional players, who now spontaneously bet large amounts to force other players out of a hand. Were there any behaviors your agent demonstrated that surprised you in the same way? What insights might we draw from this cooperative AI system that may have more general applicability to other planning domains?

1 comments

noambrown 2386 days ago

In terms of Hanabi, this bot arrived at conventions that are pretty different from how humans play the game. We invited an advanced Hanabi player to play with the bot and he pointed out a few things in particular that he'd like to start using. For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I think one important general lesson is that search is really, really important. Deep RL algorithms are making huge advancements, but Deep RL alone can't reach superhuman performance in Go or poker with search. Here, too, we see that search was the key to conquering this game, and I think that will hold true in more complex real-world settings as well. Figuring out how to extend search to more complex real-world settings will be a challenge, but it's one worth pursuing.

JoshTriplett 2386 days ago

> For example, humans usually have a rule that if your teammate hints multiple cards of the same color/number, you should play the newest one. The bot uses a more complicated rule: if the card you just picked up was hinted then play that card, otherwise play the oldest hinted card. That gives you way more flexibility to hint playable cards that would otherwise be tough to get played.

I've definitely seen advanced Hanabi players use a more subtle version of that rule: "If your hint looks like it's telling me to play my leftmost hinted card, how long has that card been playable? If it could have been hinted for play a long time ago, and it's just being hinted now, it must not be playable. So what else must you mean...?"

That version of the rule allows for more subtle cases. Suppose you hint that a player's second-from-the-left and fourth-from-the-left cards are both red. If there hasn't been an opportunity to hint the second-from-the-left since it became playable, go ahead and play the second-from-the-left. If there have been opportunities to hint second-from-the-left, play fourth-from-the-left.

That rule requires human players to model whether the other players' actions in the interim have been "urgent" things that needed taking care of before hinting them, or whether those other players would have hinted them sooner if their card was playable.