|
|
|
|
|
by 112233
254 days ago
|
|
There is no mechanism in transformer architecture for "internal" thinking ahead, or hierarchical generation. Attention only looks back from current token, ensuring that the model always falls into local maximum, even if it only leads to bad outcomes. |
|
> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.
They described the mechanism that it uses internally for planning [2]:
> Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.
> Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.
[1]: https://www.anthropic.com/research/tracing-thoughts-language...
[2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...