Hacker News new | ask | show | jobs
by nonameiguess 1038 days ago
I'm going to talk out of my ass here because I am not involved enough to know the mechanics of how LLMs are really trained at any deep level, but from the surface level understanding I have, I would expect any attempt to eliminate hallucination to be intractable given the techniques in use. As far as I understand, the initial training run is simply fed raw text and it works on the basis of predicting a next token. Then these are find-tuned using RLHF and potentially other techniques I don't know much about.

To truly eliminate hallucinations, I would think you'd have to change the initial training phase. Rather than only feeding raw text and predicting next tokens, you'd need to feed propositions labeled with some probability that they are actually true. Doing this with real fidelity is clearly not possible. No one has a database of all fact claims quantified by probability of truth. But you could potentially use the same heuristics used by human learners and impart some encoding of hierarchy of evidence. Give high weight to claims made by professional scientific organizations, high but somewhat lesser to conclusions of large-scale meta-analyses in relatively mechanistic fields, give very low weight to comments on Reddit.

That is all entirely possible but the manual human labor required seems antithetical to the business goals of anyone actually doing this kind of research. Without it, though, you're seemingly limited to either playing whack-a-mole with fine tuning out specific classes of error when they're caught or relying on a dubious assumption that plausibly human-generated utterances you're trying to mimic are sufficiently more likely to be true than false.

This problem arguably goes away if people treat LLMs for what they are, generators of strings that look like plausible human-generated utterances, rather than generators of fact claims likely to be true. But if we really want strong AI, we clearly need the latter. There is a reason epistemologists have long defined knowledge as justified true belief, not just incidentally lucking into being correct.

1 comments

If you could know that this is the case with interpretability tools than we would be able to train new models with purposeful decisions to reduce or remove hallucinations. Narrow the range of the tests and experiments you need to do to solve the problem. Otherwise we are mostly speculating about why stuff doesn't work and play a game of darts in the dark.