Hacker News new | ask | show | jobs
by moconnor 2385 days ago
Unfortunately Facebook’s approach sidesteps this complexity by ensuring each player uses the same random seed and searches policy based on information they can all see. It’s not really solving the problem as intended in my opinion.
1 comments

We are entirely focused on the self-play setting in which the goal is to learn the highest performing policy for a team of agents all trained together. The Hanabi Challenge also outlines an ad-hoc setting in which you need to adjust to the diverse policies of other agents in the team on the fly.