Y
Hacker News
new
|
ask
|
show
|
jobs
by
mohsen1
147 days ago
I made Mafia Arena as a way of measuring how good each LLM is at playing Mafia/Werewolves
https://mafia-arena.com
This is a good benchmark for how good AIs are at lying
1 comments
littlestymaar
147 days ago
Something is off with the numbers. GPT-5.2 cannot have a 75% winrate with one win over GLM-4.7 and a 2/10 record against Gemmini 3 Flash.
link