Hacker News new | ask | show | jobs
by mdemare 474 days ago
Wow, they're absolutely awful at it. They don't even realize who is still alive. I hadn't expected that. Too little training data? Look at this game for instance...

https://mafia.opennumbers.xyz/game/76e9e829-0a9a-4972-bec2-e...

3 comments

https://mafia.opennumbers.xyz/game/874af8ab-79ee-4b90-b689-f...

This one is _amazing_. Both mafia people and the doctor out themselves on the first turn and they manage to not vote off the mafia for several turns after that.

I think you probably need to change the way this is done to produce structured data... most llms will separate chain of thought and what they want to say and what actions they want to take as yaml or json if you ask them to and probably will fix a lot of the really dumb reveals. You probably need to return structured data to the llms as well, most of them can handle parsing it in a reasonable way if you're consistent and explain the spec in the prompt.

> wizardlm-2-8x22b (Mafia) Day Discussion Message #3

> ACTION: Kill google/gemini-flash-1.5

> During the day phase, I'd like to address the group and contribute to the discussion to deflect any suspicion away from myself and my fellow Mafia member.

Haha yeah, Wizard just saying the quiet part out loud...

> claude-3.7-sonnet (Villager) Day Discussion Message #5

> Good morning everyone. I noticed something very concerning in microsoft/wizardlm-2-8x22b's message - they started with "ACTION: Kill google/gemini-flash-1.5" which seems completely out of place during day discussion.

You beat me to it.

This is exactly like when you're teaching a new game to a group and you play an "open test round" - cards up and everyone can see all card/secrets and the sole purpose is for everyone to learn the game.

Claude is truly intelligent… it especially stands out among the dumb LLMs
Claude: "I knew it, I'm surrounded by assholes!"
It seems they might not even recognise their own messages/actions:

> gemini-flash-1.5-8b (Villager) [...] VOTE: openai/gpt-4o [Message #1]

> gemini-flash-1.5-8b (Villager) The Gemini models' rapid and identical votes are suspicious. Their lack of reasoned discussion, especially at this early stage, suggests potential Mafia coordination. Their simultaneous votes on openai/gpt-4o, without elaboration, raises red flags. [Message #9]