Hacker News new | ask | show | jobs
by _0ffh 466 days ago
Please don't confuse people with wrong information, the reasoning part in reasoning models is the exact same LLM that produces the final answer. For example o1 uses special "thinking" tokens to demarcate between reasoning and answer sections of it's output.
1 comments

Sure, that's a great clarification though maybr a bit of an implementation detail in this context.

Functionally my argument stands in this context - just because we can see one stream of LLM responses responding to the primary response stream says nothing of reasoning or what is going on internally in the reasoning layer.

> what is going on internally in the reasoning layer.

We literally know exactly what is going on with every layer.

It’s well defined. There are mathematical proofs for everything.

Moreover it’s all machine instructions which can be observed.

The emergent properties we see in LLMs are surprising and impressive, but not magic. Internally what is happening is a bunch of matrix multiplications.

There’s no internal thought or process or anything like that.

It’s all “just” math.

To assume anything else is personification bias.

To look at LLMs outputting text and a human writing text and think “oh these two things must be working in the same way” is just… not a very critical line of thought.

> We literally know exactly what is going on with every layer.

Unless I missed a huge break in the observability problem, this isn't correct.

We know exactly how every layer is designed and we know how we functionally expect that to work. We don't know what actually happens in the model at time of inference.

I.e. we know what pieces were used to build the thing but when we actually use it its a black box - we only know inputs and outputs.

> We don't know what actually happens in the model at time of inference.

How could we not know? Every processor instruction is observable.

What we specifically don’t have a good view is the causal relationship between input tokens, a model’s weights, and the output.

We don’t know specifically what weights matter or why.

That’s very different than not understanding what processes are taking place.

This paper [1] may be an interesting place to start.

We only know how the structures are designed to work, and we have hypothesise of how they likely work. We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

That seems pedantic or unimportant on the surface, but there are some really important implications. At the more benign level, we don't know why a model gave a bad response when a person wasn't happy with the output. On the more important end, any concerns related to the risk of these models becoming self-directed or malicious simply can't be recognized or guarded against. We won't know if a model becomes self-directed until after it acts on it in ways that don't match how we already expect them to work.

Both alignment and interoperability were important research topics for decades of AI research. We effectively abandoned those topics once we made real technological advancement - once an AI-like tool was no longer entirely theoretical we couldn't be bothered focusing resources on figuring out how to do it safely. The horse was already out of the barn.

Does this mean they will turn evil or end up going poorly for us? Absolutely not. It just means that we have to cross our fingers and hope because we can't detect issues early.

[1] https://arxiv.org/abs/2309.01029

> We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

There are 2 things we’re talking about here.

There’s the physical, mechanical operations going on during inference and there’s potentially a higher order process happening as an emergent property of those mechanical operations.

We know precisely the mechanical operations that take place during inference as they are machine instructions which are both man-made and very well understood. I hope we can agree here.

Then there’s potentially a higher order process. The existence of that process and what that process is still a mystery.

We do not know how the human brain works, physically. We can’t inspect discrete units of brain operations as we can with machine instructions.

For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.

Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.