Hacker News new | ask | show | jobs
by bcoates 2386 days ago
If you're wondering why this is interesting, games that AIs excel at like chess/checkers/go are all two-player, zero-sum, perfect-information (everyone knows everything), deterministic games, so you can exactly predict your opponents behavior by simulating "what would I do if I were them, and trying to make me lose". The only real hard problem in this space is extreme branching factors.

Everything gets vastly more complicated once you break any of those rules: non-zero sum games create a prisoner's dilemma cooperate/defect dynamic, every three or more player game is non-zero sum (and exponentially more for every player you add), hidden information forces you to manage how much you reveal to your opponent and requires you to simulate multiple "alternate futures" based on things you learn after making a decision, and randomness is equivalent to an extra player that makes irrational unpredictable moves.

Games like that are vastly closer to the messy real world than the computationally expensive but near-ideal world of games like go, and they're much more of an open problem.

3 comments

In particular, communication is in some ways a complex multiplayer game requiring multi-level modeling of other participants. In the ideal case, it can be modeled as a cooperative game. In suboptimal cases that sadly occur often in the real world, it's a game where you're fractionally cooperating and fractionally competing with every other player, and the degree to which you're cooperating or competing with any given player depends on your model of them, which you update over time based on your observations of how they "play". And it's a helpful shortcut in modeling if you can group expectations of other "players" into common conventions.

Hanabi's multi-level "what is my model of each other player, and what is my model for their model of other players including me" is remarkably deeper than it looks on the surface. Play it for long enough, and you start to handle situations for which preconceived "conventions" don't help: "OK, of the four players at the table, three of them understand certain common conventions, one of them doesn't seem to understand at least one convention based on the misfire they just had/caused (which is also consistent with their low player rating), I can probably assume they don't understand any other conventions commonly considered more challenging than the one they just failed at, so if I give this hint, how will the more advanced players understand it, can I do so without the less advanced player misunderstanding it in a harmful way, and what will happen? And also, for future games, I should remember that this player doesn't know these conventions (yet) until I see evidence that they've improved. I might also consider helping them learn more common conventions. Or, if they don't know enough conventions and don't improve, I might not want to play with them in the future at all."

Unfortunately Facebook’s approach sidesteps this complexity by ensuring each player uses the same random seed and searches policy based on information they can all see. It’s not really solving the problem as intended in my opinion.
We are entirely focused on the self-play setting in which the goal is to learn the highest performing policy for a team of agents all trained together. The Hanabi Challenge also outlines an ad-hoc setting in which you need to adjust to the diverse policies of other agents in the team on the fly.
>randomness is equivalent to an extra player that makes irrational unpredictable moves.

Irrational and unpredictable moves are also present in two-player perfect information games.

The difference is that in something like chess, you see it happen and adjust. It probably isn't the strongest move and thus probably does more harm than good.

In something like starcraft there can be a lot of moves that can seem irrational because they are only strong if they are undiscovered for too long.

I just struggle to see the real challenge in this argument. I can see that simulating something more complex than a Go game is a real engineering challenge - simulating a Go game from start to finish is really easy apart from assigning a value function to each move.

That being said:

> hidden information forces you to manage how much you reveal to your opponent

It is hard to see how that could be harder for a learning AI to pick up than any other game action with long-term benefits. It might reveal that AI are so much better at humans in games like Go and Chess that they aren't even exploiting long-term links between cause and effects, but I suspect from the example we've seen with Go that this is a solved problem.

> and requires you to simulate multiple "alternate futures" based on things you learn after making a decision,

That is describing a tree search. Why is that theoretically difficult? I can see it is a real engineering challenge to simulate a complex environment.

> and randomness is equivalent to an extra player that makes irrational unpredictable moves.

Computers handle randomness much better than humans at all levels because they have an actual statistical grounding. They still aren't very good at it, but they sure thrash humans.

I suppose basically I can see why a hidden-information game could be intractable because simulating the environment is so hard that it can't be done and breakthroughs in simulation must be found to train the AI. But I don't see why that is being linked to hidden information and cooperative gameplay. We know humans do fine without utilising hidden information, because they can play these games and they don't know any hidden information.