Hacker News new | ask | show | jobs
by six_four_eight 808 days ago
Towards the end they state: ‘… just adding “do not hallucinate” has been shown to reduce the odds a model hallucinates.’ I find this surprising and doesn’t fit with my understanding of how a language model works. But I’m very much a novice. Would this be due to update training including feedback that marks bad responses with the term “hallucinate”?
2 comments

My mental model is “telling a LLM to not make any mistakes is like telling a depressed person to stop feeling bad”.
my model is that you need to tell give it ways to make the hallucination not the most plausible thing. I prefer to tell it that it can say "I don't know"