| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dartos 469 days ago

> what is going on internally in the reasoning layer.

We literally know exactly what is going on with every layer.

It’s well defined. There are mathematical proofs for everything.

Moreover it’s all machine instructions which can be observed.

The emergent properties we see in LLMs are surprising and impressive, but not magic. Internally what is happening is a bunch of matrix multiplications.

There’s no internal thought or process or anything like that.

It’s all “just” math.

To assume anything else is personification bias.

To look at LLMs outputting text and a human writing text and think “oh these two things must be working in the same way” is just… not a very critical line of thought.

1 comments

_heimdall 469 days ago

> We literally know exactly what is going on with every layer.

Unless I missed a huge break in the observability problem, this isn't correct.

We know exactly how every layer is designed and we know how we functionally expect that to work. We don't know what actually happens in the model at time of inference.

I.e. we know what pieces were used to build the thing but when we actually use it its a black box - we only know inputs and outputs.

link

dartos 468 days ago

> We don't know what actually happens in the model at time of inference.

How could we not know? Every processor instruction is observable.

What we specifically don’t have a good view is the causal relationship between input tokens, a model’s weights, and the output.

We don’t know specifically what weights matter or why.

That’s very different than not understanding what processes are taking place.

link

_heimdall 468 days ago

This paper [1] may be an interesting place to start.

We only know how the structures are designed to work, and we have hypothesise of how they likely work. We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

That seems pedantic or unimportant on the surface, but there are some really important implications. At the more benign level, we don't know why a model gave a bad response when a person wasn't happy with the output. On the more important end, any concerns related to the risk of these models becoming self-directed or malicious simply can't be recognized or guarded against. We won't know if a model becomes self-directed until after it acts on it in ways that don't match how we already expect them to work.

Both alignment and interoperability were important research topics for decades of AI research. We effectively abandoned those topics once we made real technological advancement - once an AI-like tool was no longer entirely theoretical we couldn't be bothered focusing resources on figuring out how to do it safely. The horse was already out of the barn.

Does this mean they will turn evil or end up going poorly for us? Absolutely not. It just means that we have to cross our fingers and hope because we can't detect issues early.

[1] https://arxiv.org/abs/2309.01029

link

dartos 468 days ago

> We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

There are 2 things we’re talking about here.

There’s the physical, mechanical operations going on during inference and there’s potentially a higher order process happening as an emergent property of those mechanical operations.

We know precisely the mechanical operations that take place during inference as they are machine instructions which are both man-made and very well understood. I hope we can agree here.

Then there’s potentially a higher order process. The existence of that process and what that process is still a mystery.

We do not know how the human brain works, physically. We can’t inspect discrete units of brain operations as we can with machine instructions.

For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.

Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.

link

_heimdall 468 days ago

> There are 2 things we’re talking about here.

I was never actually talking about the physical mechanisms. Sure we can agree that GPUs, logical gates, etc physically work in a certain way. That just isn't important here at all.

> For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.

I wasn't intending to raise concerns over emergent consciousness or similar. Whether thought goes on is a bit less clear depending on how you define thought, but that still wasn't the point I was making.

We have effectively abandoned the alignment problem and the interoperability problem. Sure we know how GPUs work, and we don't need to assume that consciousness emerged, but we don't know why the model gives a certain answer. We're empowering these models with more and more authority, not only are they given access to the public internet but now we're making agents that are starting to interact with the world on our behalf. Models are given plenty of resources and access to do very dangerous things if they tried to, and my point is we don't have any idea what goes on other than input/output pairs. There's a lot of risk there.

> Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.

Comparing the two is precisely what we're meant to do. If the comparison wasn't intended they wouldn't be called "artificial intelligence". That isn't pedantic, if the term isn't meant to imply the comparison then they were either accidentally or intentionally named horribly.

link

dartos 468 days ago

> I wasn't intending to raise concerns over emergent consciousness or similar

Oh jeez, then we may have just been talking past each other. I thought that’s what you were arguing for.

> That just isn't important here at all.

It is, though. The fact that the underlying processes are well understood means that, if we so wished, we could work backwards and understand what the model is doing.

I recall some papers on this, but can’t seem to find them right now. One suggested that groups of weights relate to specific kinds of high level info (like people) which I thought was neat.

> the comparison wasn't intended they wouldn't be called "artificial intelligence"

Remember “smart” appliances? Were we meant to compare an internet connected washing machine to smart people? Names are all made up.

I do actually think AI is a horrible name as it invites these kinds of comparisons and obfuscates more useful questions.

Machine Learning is a better name, imo, but I’m not a fan of personifying machines in science.

Too many people get sci-fi brain.

link