Hacker News new | ask | show | jobs
by mrtesthah 723 days ago
By their nature, the models don’t ‘try’ to do anything at all—they’re just weights applied during inference, and the semantic features that are most prevalent in the training set will be most likely to be asserted as truth.
2 comments

They are trained to predict next word that is similar to the text they have seen, I call that what they "try" to do here. A chess AI tries to win since that is what it was encouraged to do during training, current LLM try to predict the next word since that is what they are trained to do, there is nothing wrong using that word.

This is an accurate usage of try, ML models at their core tries to maximize a score, so what that score represents is what they try to do. And there is no concept of truth in LLM training, just sequences of words, they have no score for true or false.

Edit: Humans are punished as kids for being wrong all throughout school and in most homes, that makes human try to be right. That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong.

> That is very different from these models that are just rewarded for mimicking regardless if it is right or wrong

That's not a totally accurate characterization. The base models are just trained to predict plausible text, but then the models are fine-tuned on instruct or chat training data that encourages a certain "attitude" and correctness. It's far from perfect, but an attempt is certainly made to train them to be right.

They are trained to replicate text semantically and then given a lot of correct statements to replicate, that is very different from being trained to be correct. That makes them more useful and less incorrect, but they still don't have a concept of correctness trained into them.
Exactly, if a massive data poisoning would happen, will the AI be able to know what’s the truth is there is as much new false information than there is real one ? It won’t be able to reason about it