| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by famouswaffles 68 days ago
	I did answer it, albeit not directly. "Guaranteed to be the motivation" isn't a standard anyone can meet, and so framing it that way doesn't really probe anything meaningful about LLMs specifically. If what you want to hear is No, then sure, have your No, but it doesn't mean anything. There's just not much to the question. Even though you had it up as one borne of a greater understanding of LLMs, the interpretability research we have so far, and our current very little understanding of the internal computations of these models does not support your position and certainly not how assured you are about it.

1 comments

embedding-shape 67 days ago

> our current very little understanding of the internal computations of these models does not support your position

Our current understanding is sufficient to know you can not ask the LLM to explain it's behavior and it can correctly do so, I'm not what research you've read to believe this could be possible in the first place, but happy to receive links to read through, if you're sitting on them.

link

famouswaffles 67 days ago

Explanations can be faithful sometimes. That's the standard we can expect for any intelligence as far as we're aware.

https://arxiv.org/abs/2504.14150

link