| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by djhn 42 days ago

I’m genuinely interested in someone countering the following evidence that supports the authors.

Plane of words: broadly correct. Everything is flattened to tokens and token sequences, and the training data is dominated by text tokens.

Reasoning: CoT tokens are mostly just tokens, more appropriately called intermediate tokens, and are largely disconnected from the end result. Including them improves the end result (user satisfaction), but does not imply reasoning. See for example Turpin 2023, Mirzadeh 2024, Pournemat 2025, Palod 2025.

Synthesising evidence: You can achieve SOTA summaries with LLMs, but this involves, for example, using a harness to generate dozens of summaries with different models, separately using some kind of vector embedding model to compare results to the original, and selecting the best match. This is not how most people are using LLMs for summaries. While this is being slowly RLVR’d in post-training, a one-shot naive summary underperforms more complex methods significantly.

1 comments

simianwords 42 days ago

What? Reasoning models are inventing proofs for unsolved open problems in mathematics. That is my benchmark for reasoning.

djhn 42 days ago

I think I know the examples you’re talking about. They don’t show much in terms of reasoning.

The Erdős problems have turned out to be largely brute force or finding older results.

The Feb 2026 GPT-5.2 theoretical physics paper was a result of “dialogue between physicists and LLMs”, called “grad student level” by experts in the field, used a “custom harnessed” “internal OpenAI” model with “20 hours of reasoning”. Quotes from OpenAI blog.

The Matthew Schwartz physics paper with Claude this March involved “51,248 messages across 270 sessions, producing over 110 draft versions and consuming 36 million tokens”, and the actual contribution was Schwartz finding an error in Claude’s solution.

manoDev 41 days ago

Is symbol manipulation reasoning? If so, machines have always been capable of reasoning, we just instructed them with a language other than English.