| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pierrekin 94 days ago

I agree that the model can help troubleshoot and debug itself.

I argue that the model has no access to its thoughts at the time.

Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.

If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).

3 comments

XenophileJKO 94 days ago

Anthropic's introspection experiments have seemed to show that your argument is falsifiable.

https://www.anthropic.com/research/introspection

link

sumeno 94 days ago

> In fact, most of the time models fail to demonstrate introspection—they’re either unaware of their internal states or unable to report on them coherently.

You got the wrong takeaway from your link.

link

XenophileJKO 94 days ago

The parent said: "I argue that the model has no access to its thoughts at the time."

This is falsified by that study, showing that on the frontier models generalized introspection does exist. It isn't consistent, but is is provable.

"no access" vs. "limited access"

link

sumeno 94 days ago

There is no way for a user to know whether the LLM has introspection in a given case or not, and given that the answer is almost always no it is much better for everyone to assume that they do not have introspection.

You cannot trust that the model has introspection so for all intents and purposes for the end user it doesn't.

link

dwheeler 94 days ago

I would say "limited and unreliable access". What it says is the cause might be the cause, but it's not on any way certain.

link

fragmede 94 days ago

Claude code and codex both hide the Chain of Thought (CoT) but it's just words inside a set of <thinking> tags </thinking> and the agent within the same session has access to that plaintext.

link

fc417fc802 94 days ago

Those are just words inside arbitrary tags, they aren't actually thoughts. Think of it as asking the model to role play a human narrating his internal thought process. The exercise improves performance and can aid in human understanding of the final output but it isn't real.

link

antonvs 94 days ago

Why do you believe that humans have access to an “internal thought process”? I.e. what do you think is different about an agent’s narration of a thought process vs. a human’s?

I suspect you’re making assumptions that don’t hold up to scrutiny.

link

fc417fc802 94 days ago

I made no such claim and I don't understand what direct relevance you believe the human thought process has to the issue at hand.

You appear to be defaulting to the assumption that LLMs and humans have comparable thought processes. I don't think it's on me to provide evidence to the contrary but rather on you to provide evidence for such a seemingly extraordinary position.

For an example of a difference, consider that inserting arbitrary placeholder tokens into the output stream improves the quality of the final result. I don't know about you but if I simply repeat "banana banana banana" to myself my output quality doesn't magically increase.

link

antonvs 93 days ago

> I don't understand what direct relevance you believe the human thought process has to the issue at hand.

You're the one who raised it. Perhaps you should clarify what you mean by "isn't real" - do you believe a human narrating their thought process is saying something that's more real?

Someone else replied to your comment asking essentially the same question, perhaps better phrased:

> What would be different if it was "real"? What makes you think that when humans "narrate" "their" "internal thought process", it's any more "real"?

link

fc417fc802 93 days ago

No, I did not raise it. I said that X is false. You responded with "why do you think Y is true" and now you ask "do you believe that Y is true" neither of which is relevant to X being true or false. Humans and LLMs are not the same thing. The colloquial term for this is whataboutism.

What do I mean by isn't real? Exactly what I said originally. It's a roleplay of something that sounds plausible as opposed to what actually happened. There is obviously some process that is producing the output. The thinking trace is not a representation of that underlying process. Rather the thinking trace is an adjacent output of that same process.

link

DiogenesKynikos 93 days ago

Given that LLMs can speak basically any language and answer almost any arbitrary question much like a human would, the claim that LLMs have comparable (not identical) thought processes to humans does not seem extraordinary at all.

link

yladiz 93 days ago

Are you legitimately arguing that humans don’t have an internal thought process in some way?

link

vidarh 93 days ago

They're arguing that we have no evidence that humans have access to our underlying thoughts any more than the models do.

link

yladiz 93 days ago

What does that mean though, to “have access to our underlying thoughts”? Humans can obviously mentally do things that are impossible for a language model to do, because it’s trivial to show that humans do not need language to do mental tasks, and this includes things related to thought, so I don’t really get what is being argued in the first place.

link

lmm 93 days ago

What would be different if it was "real"? What makes you think that when humans "narrate" "their" "internal thought process", it's any more "real"?

link

fc417fc802 93 days ago

I ask a human "predict what a mouse would do here". In an effort to understand why the prediction is sometimes wrong I ask "walk me through what the imaginary mouse is thinking". Upon examination I exclaim "aha! there's the error" but sadly it's not actually because the output prediction was not based on the thinking trace in any robust manner.

That's a loose analogy but it fails to fully illustrate the degree of decoupling here. For example the weirdness of LLM performance being increased via the output of empty sequences.

link

lmm 92 days ago

> I ask a human "predict what a mouse would do here". In an effort to understand why the prediction is sometimes wrong I ask "walk me through what the imaginary mouse is thinking". Upon examination I exclaim "aha! there's the error" but sadly it's not actually because the output prediction was not based on the thinking trace in any robust manner.

Is this meant to be an analogy for a human or an LLM? Where would it be different in the other case?

link

jmalicki 94 days ago

It does have access to its thoughts. This is literally what thinking models do. They write out thoughts to a scratch pad (which you can see!) and use that as part of the prompt.

link

fc417fc802 94 days ago

It's important to be aware that while those "thoughts" can be a useful aid for human understanding they don't seem to reliably reflect what's going on under the hood. There are various academic papers on the matter or you can closely inspect the traces of a more logically oriented question for yourself and spot impossible inconsistencies.

link

mmoll 94 days ago

It doesn’t mean that these “thoughts” influenced their final decision the way they would in humans. An LLM will tell you a lot of things it “considered” and its final output might still be completely independent of that.

link

jmalicki 94 days ago

Its output quite literally is not independent, as the "thinking tokens" are attended to by the attention mechanism.

link

grey-area 94 days ago

They do not in fact do that. The ‘thoughts’ are not a chain of logic.

link

sumeno 94 days ago

You have a fundamental misunderstanding of what the model is doing. It's not your fault though, you're buying into the advertising of how it works

link

eleumik 93 days ago

Those are a funny progress bar made by a micro model , is just ui

link