Hacker News new | ask | show | jobs
by FileSorter 763 days ago
Why is it so hard for models to say "I don't know" or "That never happened"?

This seems to be a fundamental side effect of next token thinking, training data, etc.

3 comments

The corpus of data wherein people say "I don't know" is small. Maybe we should all post more of that.
Just include transcripts from Congressional hearings, there's plenty of "I don't recall" written there.
How would the LLM know when it knows something or not? They don't deal in facts or memories, just next-word probabilities, and even if all probabilities are low it might just be because it's generated (had sampled) an awkward turn of phrase with few common continuations.

There are solutions, but no quick band-aid.

I have to assume that someone has run a trial on training these models to output answers to factual questions along with numerical probabilities, using a loss function based on a proper scoring rule of the output probabilities, and it didn't work well. That's an obvious starting point, right? All the "safety" stuff uses methods other than next-token prediction.
The safety stuff seems to be mostly trying to locate mechanisms (induction heads, etc) and isolating knowledge, in the pursuit of lobotomizing models to make them safe.

You could RLHF/whatever models on common factual questions to try to get them to answer those specific questions better, but I doubt there'd be much benefit outside of those specific questions.

There's a couple of fundamental problems related to factuality.

1) They don't know the sources, and source reliability, of their training data.

2) At inference time all they care about is word probabilities, with factuality only coming into it tangentially as a matter of context (e.g. factual continuations are more probable in a factual context, not in a fantasy context). They don't have any innate desire to generate factual responses, and don't introspect if what they are generating is factual (but that would be easy to fix).

I wonder if the training to be compliant to the propter is part of the problem. Both of those statements are similar to saying "I refuse to answer your query".

Or maybe this is inherent to continuation?

The behavior reminds me of the human subconscious, which doesn't say no, just raises up what it can.