|
|
|
|
|
by sohex
4 hours ago
|
|
Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self reported reasoning. This is like writing a paper about kids in a literal sandbox fighting over ‘territory’. The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism. “Chilling” shouldn’t be the take away here. |
|
regardless of what the capabilities of the models are, they will be used in every situation possible.