| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andai 72 days ago

What's the implication of this? That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

But reasoning does improve performance on many tasks, and even weirder, the performance improves if reasoning tokens are replaced with placeholder tokens like "..."

I don't understand how LLMs actually work, I guess there's some internal state getting nudged with each cycle?

So the internal state converges on the right solution, even if the output tokens are meaningless placeholders?

2 comments

orbital-decay 71 days ago

>That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

Yes it plans ahead, but with significant uncertainty until it actually outputs these tokens and converges on a definite trajectory, so it's not a useless filler - the closer it is to a given point, the more certain it is about it, kind of similar to what happens explicitly in diffusion models. And it's not all that happens, it's just one of many competing phenomena.

link

not_that_d 72 days ago

> I don't understand how LLMs actually work...

Plot twist, they don't either. They just throw more hardware and try things up until something sticks.

link