|
|
|
|
|
by andai
72 days ago
|
|
What's the implication of this? That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization? But reasoning does improve performance on many tasks, and even weirder, the performance improves if reasoning tokens are replaced with placeholder tokens like "..." I don't understand how LLMs actually work, I guess there's some internal state getting nudged with each cycle? So the internal state converges on the right solution, even if the output tokens are meaningless placeholders? |
|
Yes it plans ahead, but with significant uncertainty until it actually outputs these tokens and converges on a definite trajectory, so it's not a useless filler - the closer it is to a given point, the more certain it is about it, kind of similar to what happens explicitly in diffusion models. And it's not all that happens, it's just one of many competing phenomena.