Hacker News new | ask | show | jobs
by JoshTriplett 2388 days ago
In particular, communication is in some ways a complex multiplayer game requiring multi-level modeling of other participants. In the ideal case, it can be modeled as a cooperative game. In suboptimal cases that sadly occur often in the real world, it's a game where you're fractionally cooperating and fractionally competing with every other player, and the degree to which you're cooperating or competing with any given player depends on your model of them, which you update over time based on your observations of how they "play". And it's a helpful shortcut in modeling if you can group expectations of other "players" into common conventions.

Hanabi's multi-level "what is my model of each other player, and what is my model for their model of other players including me" is remarkably deeper than it looks on the surface. Play it for long enough, and you start to handle situations for which preconceived "conventions" don't help: "OK, of the four players at the table, three of them understand certain common conventions, one of them doesn't seem to understand at least one convention based on the misfire they just had/caused (which is also consistent with their low player rating), I can probably assume they don't understand any other conventions commonly considered more challenging than the one they just failed at, so if I give this hint, how will the more advanced players understand it, can I do so without the less advanced player misunderstanding it in a harmful way, and what will happen? And also, for future games, I should remember that this player doesn't know these conventions (yet) until I see evidence that they've improved. I might also consider helping them learn more common conventions. Or, if they don't know enough conventions and don't improve, I might not want to play with them in the future at all."

1 comments

Unfortunately Facebook’s approach sidesteps this complexity by ensuring each player uses the same random seed and searches policy based on information they can all see. It’s not really solving the problem as intended in my opinion.
We are entirely focused on the self-play setting in which the goal is to learn the highest performing policy for a team of agents all trained together. The Hanabi Challenge also outlines an ad-hoc setting in which you need to adjust to the diverse policies of other agents in the team on the fly.