| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 1242 days ago

> blatantly wrong info being spewed with 105% confidence

There are some approaches. For example in this paper they say truth has a certain logical consistency that is lacking in hallucinations and deception. So they find this latent direction that indicates truth in a frozen LLM. This actually works better than asking the model to self evaluate by text generation, or training with RLHF.

"Discovering Latent Knowledge in Language Models Without Supervision" https://arxiv.org/abs/2212.03827

There's also a video with the first author: "Making LLMs Say The Truth" https://www.youtube.com/watch?v=XSQ495wpWXs&t=1515s

Btw, I think this is one of the deepest discussions about LLM hallucinations and alignment I ever saw. Worth a watch, even if it is a bit long. Not every day something like this comes long.

1 comments

HarHarVeryFunny 1242 days ago

Very interesting video - thanks for posting that.

It makes you wonder what other abstract concepts current models may have had to learn to get as good as they are. If they're doing a good job of modelling when someone is speaking the truth, then what else have they learnt about us?

How complete of a "world model" can you learn purely in a passive way by consuming whatever online text is available to train on, or maybe by consuming all existent written material were it to be digitized? At some point I'm sure you need to be able to interact with the world to test hypothesis etc, but how far can predictive "intelligence" go without that?