| HN Mirror

> Do we even know why AIs "hallucinate"?

It's because, just like human memory, they aren't databases or search engines. Generative transformer models are basically next-word-prediction machines on steroids. They take the input and try to "guess" the most likely reply based on their training data.

These machines have no way to distinguish facts from fiction, only probabilities of combinations of words that would make the most plausible reply.

> Is it possible to prevent it?

There are methods to prevent this by incorporating specialised knowledge databases into the training material of these models. This, however, only works with models that have been finetuned on very specific tasks and topics [1].

Other approaches use AI to transform human questions ("bag of words" inputs) to queries into structured knowledge bases, match the results (e.g. tree-like structures of context and facts) to the question and turn them back into human language [2]. The downside of these methods is that they're currently limited to simple QA formats and won't feel as "natural" as talking to chatbot and requires carefully prepared and curated knowledge databases.

[1] http://jens-lehmann.org/files/2019/iswc_bert_simple_question...

[2] https://arxiv.org/abs/2303.13284