My intuition on this is like training a classifier on four classes: dog, cat, cow and IDK. It feels intuitive to us but really hard to do in practice.
In the classifier case, we are leveraging a subset of data to train the model to give correct answers to unseen data. If we want the model to generalize to unseen data we need it to call unseen dog-like things a dog. If not, then all unseen dogs would be IDK.
Learning that boundary of "known vs unknown" is very hard. If done poorly, you have a model that cannot abstract to anything that is not in the dataset which is a huge part of what makes these models so impressive.
I'm sure there is more to it than this but I does not surprise me that it is an unsolved problem.
Yeah, they only “proved” hallucination is inevitable by defining it to be any case where the llm doesn’t provide the “correct” answer. By this definition, an LLM deciding not to answer is also a “hallucination”.