Hacker News new | ask | show | jobs
by Jensson 729 days ago
> We need systems that try to be coherent, not systems that try to be unequivocally right, which wouldn't be possible.

The fact that it isn't possible to be right about 100% of things doesn't mean that you shouldn't try to be right.

Humans generally try to be right, these models don't, that is a massive difference you can't ignore. The fact that humans often fails to be right doesn't mean that these models shouldn't even try to be right.

2 comments

By their nature, the models don’t ‘try’ to do anything at all—they’re just weights applied during inference, and the semantic features that are most prevalent in the training set will be most likely to be asserted as truth.
They are trained to predict next word that is similar to the text they have seen, I call that what they "try" to do here. A chess AI tries to win since that is what it was encouraged to do during training, current LLM try to predict the next word since that is what they are trained to do, there is nothing wrong using that word.

This is an accurate usage of try, ML models at their core tries to maximize a score, so what that score represents is what they try to do. And there is no concept of truth in LLM training, just sequences of words, they have no score for true or false.

Edit: Humans are punished as kids for being wrong all throughout school and in most homes, that makes human try to be right. That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong.

> That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong

That's not a totally accurate characterization. The base models are just trained to predict plausible text, but then the models are fine-tuned on instruct or chat training data that encourages a certain "attitude" and correctness. It's far from perfect, but an attempt is certainly made to train them to be right.

They are trained to replicate text semantically and then given a lot of correct statements to replicate, that is very different from being trained to be correct. That makes them more useful and less incorrect, but they still don't have a concept of correctness trained into them.
Exactly, if a massive data poisoning would happen, will the AI be able to know what’s the truth is there is as much new false information than there is real one ? It won’t be able to reason about it
> Humans generally try to be right,

I think this assumption is wrong, and it's making it difficult for people to tackle this problem, because people do not, in general, produce writing with the goal of producing truthful statements. They try to score rhetorical points, they try to _appear smart_, they sometimes intentionally lie because it benefits them for so many reasons, etc. Almost all human writing is full of a range of falsehooods ranging from unintentional misstatements of fact to out-and-out deceptions. Like forget the politically-fraught topic of journalism and just look at the writing produced in the course of doing business -- everything from PR statements down to jira tickets is full of bullshit.

Any system that is capable of finding "hallucinations" or "confabulations" in ai generated text in general should also be capable of finding them in human produced text, which is probably an insolvable problem.

I do think that since the models do have some internal representation of certitude about facts,that the smaller problem of finding potential incorrect statements in its own produced text based on what it knows about the world _is_ possible, though.