|
|
|
|
|
by PoignardAzur
893 days ago
|
|
There is empirical research showing that hallucinations in LLM tend to vary enormously from one answer to the next compared to accurate answers. Of course, it could be that GPT-4 has been instructed to lie about its prompt, but failing that, you should expect any answer that stays the same across multiple wordings and prompting methods to be accurate. |
|
An accurate answer is often driven by a concrete and highly confident fact in the training dataset (e.g. structured data fact, like a birth date from Wikipedia etc.).
The hallucinations are derived facts of (hopefully) low confidence. Nondeterminism is more common if you have low scores. Only a few facts can take high score (in a usable system), while many can take a low score -- then numeric instability can make a mess.
I'm not very familiar with LLMs, but I do have experience with the traditional ML models and content understanding production system. But, LLMs are not far from them.