|
|
|
|
|
by markwkw
791 days ago
|
|
You can easily demonstrate that an LLM does know certain fact X
AND demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact) There are two explanations:
A. They lack self-reflection
B. They know they know fact X, but avoid acknowledging for ... reasons? I find the argument for A quite compelling |
|
No, the sampling algorithm you used to query the LLM does that. Not the model itself.
e.g. https://arxiv.org/pdf/2306.03341.pdf
> B. They know they know fact X, but avoid acknowledging for ... reasons?
That reason being that the sampling algorithm didn't successfully sample the answer.