It's funny that reasoning models sometime speaking nonsense and perform worse than well-aligned models like claude-3.5-sonnet in multi-turn games like Akinator. I think it's one current weak point of applying longCoT RL vs. instruction-following alignment. Maybe we need to find a way to address both? Would be interesting to see some results
I played the game and found hard mode to be an exciting challenge—it's incredibly fun, and the AI is so clever it even guessed my intentions in the taboo game!
When super intelligence comes, it would be very interesting to see multi-party game play among AI too. What role humans play in this story is unclear. Maybe humans can't directly engage in the games neither as they are too naive and will be immediately identified and exploited by AI :)