| HN Mirror

Hallucinations heavily are correlated with poor performance on factual benchmarks, or at least I was referring to hallucinating false facts it was never trained on.

There's a subset of hallucinations though where it hallucinates real factual things that weren't in the context as if they were there, but I think reasoning models improve on those since they deal with much longer strings of thought than typical internet fare, but maybe that's wrong. Deepseek reported heavily improved long context benchmark performance in r1 vs v3.

You could characterize "did needle occur in haystack of text? response: yes" as a hallucination, but those weren't what I was referring to. But they do seem to improve on those types of tasks after RL and reduce that kind of hallucination.

If it knows well what it doesn't know it could do well on a hallucination benchmark, while still doing worse on a factual breadth benchmark, but catastrophic forgetting degrades many other abilities and not just knowledge so I would tend to think it would degrade that too. At some point claude got much better about knowing what it doesn't know, I don't know if that was an emergent or trained ability, or if they did something more hand made like giving it access to logits of tokens or of contextual embeddings of what it previously generated.

Edit: just guessed on that last one but apparently there is a paper that tried that: https://arxiv.org/html/2409.06601v1