Hacker News new | ask | show | jobs
by astrange 791 days ago
> demonstrate that the LLM will deny that they know fact X (or be flaky about it, randomly denying and divulging the fact)

No, the sampling algorithm you used to query the LLM does that. Not the model itself.

e.g. https://arxiv.org/pdf/2306.03341.pdf

> B. They know they know fact X, but avoid acknowledging for ... reasons?

That reason being that the sampling algorithm didn't successfully sample the answer.