| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by famouswaffles 68 days ago

>What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

What research shows that you can ask a Human to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation? Because there's no such thing. If anything, what research exists suggests any explanation we're making is a nice post-hoc rationalization after the fact even if the Human thinks otherwise.

https://transformer-circuits.pub/2025/introspection/index.ht...

1 comments

embedding-shape 68 days ago

Why not try to answer my question, instead of asking a different question which I haven't even claimed to have the answer to?

link

famouswaffles 68 days ago

I did answer it, albeit not directly. "Guaranteed to be the motivation" isn't a standard anyone can meet, and so framing it that way doesn't really probe anything meaningful about LLMs specifically. If what you want to hear is No, then sure, have your No, but it doesn't mean anything. There's just not much to the question.

Even though you had it up as one borne of a greater understanding of LLMs, the interpretability research we have so far, and our current very little understanding of the internal computations of these models does not support your position and certainly not how assured you are about it.

link

embedding-shape 67 days ago

> our current very little understanding of the internal computations of these models does not support your position

Our current understanding is sufficient to know you can not ask the LLM to explain it's behavior and it can correctly do so, I'm not what research you've read to believe this could be possible in the first place, but happy to receive links to read through, if you're sitting on them.

link

famouswaffles 67 days ago

Explanations can be faithful sometimes. That's the standard we can expect for any intelligence as far as we're aware.

https://arxiv.org/abs/2504.14150

link