| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _heimdall 468 days ago

This paper [1] may be an interesting place to start.

We only know how the structures are designed to work, and we have hypothesise of how they likely work. We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

That seems pedantic or unimportant on the surface, but there are some really important implications. At the more benign level, we don't know why a model gave a bad response when a person wasn't happy with the output. On the more important end, any concerns related to the risk of these models becoming self-directed or malicious simply can't be recognized or guarded against. We won't know if a model becomes self-directed until after it acts on it in ways that don't match how we already expect them to work.

Both alignment and interoperability were important research topics for decades of AI research. We effectively abandoned those topics once we made real technological advancement - once an AI-like tool was no longer entirely theoretical we couldn't be bothered focusing resources on figuring out how to do it safely. The horse was already out of the barn.

Does this mean they will turn evil or end up going poorly for us? Absolutely not. It just means that we have to cross our fingers and hope because we can't detect issues early.

[1] https://arxiv.org/abs/2309.01029

1 comments

dartos 468 days ago

> We can't interpret what actually happens when the LLM is actually going through the process of generating a response.

There are 2 things we’re talking about here.

There’s the physical, mechanical operations going on during inference and there’s potentially a higher order process happening as an emergent property of those mechanical operations.

We know precisely the mechanical operations that take place during inference as they are machine instructions which are both man-made and very well understood. I hope we can agree here.

Then there’s potentially a higher order process. The existence of that process and what that process is still a mystery.

We do not know how the human brain works, physically. We can’t inspect discrete units of brain operations as we can with machine instructions.

For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.

Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.

link

_heimdall 467 days ago

> There are 2 things we’re talking about here.

I was never actually talking about the physical mechanisms. Sure we can agree that GPUs, logical gates, etc physically work in a certain way. That just isn't important here at all.

> For that reason, it is uncritical to assume that there is any kind of “thought” process occurring at inference which is similar to our thought processes.

I wasn't intending to raise concerns over emergent consciousness or similar. Whether thought goes on is a bit less clear depending on how you define thought, but that still wasn't the point I was making.

We have effectively abandoned the alignment problem and the interoperability problem. Sure we know how GPUs work, and we don't need to assume that consciousness emerged, but we don't know why the model gives a certain answer. We're empowering these models with more and more authority, not only are they given access to the public internet but now we're making agents that are starting to interact with the world on our behalf. Models are given plenty of resources and access to do very dangerous things if they tried to, and my point is we don't have any idea what goes on other than input/output pairs. There's a lot of risk there.

> Comparing the two is like apples and oranges anyway and is pedantic in a non-useful way, especially with our limited understanding of the human brain.

Comparing the two is precisely what we're meant to do. If the comparison wasn't intended they wouldn't be called "artificial intelligence". That isn't pedantic, if the term isn't meant to imply the comparison then they were either accidentally or intentionally named horribly.

link

dartos 467 days ago

> I wasn't intending to raise concerns over emergent consciousness or similar

Oh jeez, then we may have just been talking past each other. I thought that’s what you were arguing for.

> That just isn't important here at all.

It is, though. The fact that the underlying processes are well understood means that, if we so wished, we could work backwards and understand what the model is doing.

I recall some papers on this, but can’t seem to find them right now. One suggested that groups of weights relate to specific kinds of high level info (like people) which I thought was neat.

> the comparison wasn't intended they wouldn't be called "artificial intelligence"

Remember “smart” appliances? Were we meant to compare an internet connected washing machine to smart people? Names are all made up.

I do actually think AI is a horrible name as it invites these kinds of comparisons and obfuscates more useful questions.

Machine Learning is a better name, imo, but I’m not a fan of personifying machines in science.

Too many people get sci-fi brain.

link

_heimdall 467 days ago

Haha, well its funny sometimes when you realize too late there were two different conversations happening.

I definitely agree on the term machine learning - it seems a much better fit but still doesn't feel quite right. Naming things is hard, but AI seems particularly egregious here.

> The fact that the underlying processes are well understood means that, if we so wished, we could work backwards and understand what the model is doing.

I'm not sure we can take that leap. We understand pretty well how a neuron functions but we understand very little about how the brain works or how it relates to what we experience. We understand how light is initially recognized in the eye with cones and rods, but we don't really know exactly how it goes from there to what we experience as vision.

In complex systems its often easy to understand the function of a small, more fundamental but of the system. Its much harder to understand the full system, and if you do you should be able to predict it. For LLMs, that would mean they could predict a model's output for a given input (even if that prediction has to account to randomness added into the inference algorithm).

link