Hacker News new | ask | show | jobs
by 112233 254 days ago
There is no mechanism in transformer architecture for "internal" thinking ahead, or hierarchical generation. Attention only looks back from current token, ensuring that the model always falls into local maximum, even if it only leads to bad outcomes.
2 comments

Not strictly true: while this was previously believed to be the case, Anthropic demonstrated that transformers can "think ahead" in some sense, for example when planning rhymes in a poem [1]:

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

They described the mechanism that it uses internally for planning [2]:

> Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.

> Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.

[1]: https://www.anthropic.com/research/tracing-thoughts-language...

[2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...

Thank you for these links! Their "circuits" research is fascinating. In the example you mention, note how the planned rhyme is piggybacking on the newline token. The internal state that the emergent circuits can use is 1:1 mapped to the tokens. Model cannot trigger an insertion of a "null" token for the purpose of storing this plan-ahead information during inference. Neither there are any sort of "registers" available aside from the tokens. The "thinking" LLMs are not quite that, because the thinking tokens are still forced to become text.
That's what reasoning models are for. You can get most of the benefit by saying an answer once in the reasoning section, because then it can read over it when it outputs it again in the answer section.

It could also have a "delete and revise" token, though you'd have to figure out how to teach it to get used.

Given how badly most models degrade once reaching a particular context size (any whitepapers on this welcome), reasoning does seem like quick hack, instead of a thought out architecture.