| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by markwkw 791 days ago

You can easily demonstrate that an LLM does know certain fact X AND demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact)

There are two explanations: A. They lack self-reflection B. They know they know fact X, but avoid acknowledging for ... reasons?

I find the argument for A quite compelling

2 comments

astrange 791 days ago

> demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact)

No, the sampling algorithm you used to query the LLM does that. Not the model itself.

e.g. https://arxiv.org/pdf/2306.03341.pdf

> B. They know they know fact X, but avoid acknowledging for ... reasons?

That reason being that the sampling algorithm didn't successfully sample the answer.

link

throwaway290 791 days ago

They will say "it's just a bad LLM", don't bother

link