Hacker News new | ask | show | jobs
by PoignardAzur 893 days ago
There is empirical research showing that hallucinations in LLM tend to vary enormously from one answer to the next compared to accurate answers.

Of course, it could be that GPT-4 has been instructed to lie about its prompt, but failing that, you should expect any answer that stays the same across multiple wordings and prompting methods to be accurate.

1 comments

That's mostly intuitive.

An accurate answer is often driven by a concrete and highly confident fact in the training dataset (e.g. structured data fact, like a birth date from Wikipedia etc.).

The hallucinations are derived facts of (hopefully) low confidence. Nondeterminism is more common if you have low scores. Only a few facts can take high score (in a usable system), while many can take a low score -- then numeric instability can make a mess.

I'm not very familiar with LLMs, but I do have experience with the traditional ML models and content understanding production system. But, LLMs are not far from them.