| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by glymor 724 days ago
	TL;DR sample the top N results from the LLM and use traditional NLP to extract factoids, if the LLM is confabulating the factoids would have random distribution, but if it's not it will be heavily weighted towards one answer. A figure from the paper shows this better than my TL;DR: https://www.nature.com/articles/s41586-024-07421-0/figures/1

3 comments

Terr_ 723 days ago

The LLM is already generating factoids: Things which resemble a fact without actually being one.

(See also: Androids that resemble men but aren't, asteroids that resemble stars but aren't, meteoroids that resemble meteors but aren't...)

link

cl42 723 days ago

Thank you! This is so helpful.

It's also interesting to see what temperature value they use (1.0, 0.1 in some cases?)... I have a feeling using the actual raw probability estimates (if available) would provide a lot of information without having to rerun the LLM or sample quite as heavily.

link

visarga 723 days ago

Or we could just ask the same question on 3 different LLMs, ideally a large LLM, a RAG LLM and a small one, then use LLM again to rewrite the final answer. When models contradict each other there is likely hallucination going on, but correct answers tend to converge.

link

superb_dev 723 days ago

Why use an LLM to check the work of a different LLM?

You could use the same technique that this paper describes to compare the answers each LLM gave. LLMs don’t have to be in opposition to traditional NLP techniques

link