Hacker News new | ask | show | jobs
by sohex 4 hours ago
Sonnet, GPT-5.2, Gemini Flash, in a set of 21 games, where conclusions are drawn from the LLMs self reported reasoning.

This is like writing a paper about kids in a literal sandbox fighting over ‘territory’.

The models employed don’t indicate the actual extents of machine reasoning even as we currently recognize them. They certainly don’t have the metacognition necessary to accurately understand their own reasoning. As we’ve seen with recent papers on how LLMs do math there’s a complete disconnect between actual and reported mechanism.

“Chilling” shouldn’t be the take away here.

3 comments

So in the conext you just laid out, you can apply that to this. "Artificial Intelligence Strategy for the Department of War" https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/art...

regardless of what the capabilities of the models are, they will be used in every situation possible.

> “Chilling” shouldn’t be the take away here.

It is when you consider the personality currently occupying the office of US SecDef.

LLMs have already been used to bomb school girls, chilling is absolutely the operative word to use here. Especially since these delusional fools want to incorporate LLMs into everything.
Forgive my ignorance, but were LLMs involved in that decision? I don't remember hearing anything to that effect, but we're so bombarded by news these days I guess I could just be forgetting
Perhaps not in that one, but in plenty more: https://www.972mag.com/lavender-ai-israeli-army-gaza/
Yes our government purportedly used technology to work up a list of targets in the Iran debacle as well just not with a LLM a distinction that to me just isn't that meaningful

https://www.theguardian.com/news/2026/mar/26/ai-got-the-blam...