Hacker News new | ask | show | jobs
by hnfong 1141 days ago
AFAICT the tokens are probably the issue.

Imagine the question "In which year was Donald Trump born?"

The LLM would start the answer by either:

"Donald Trump was born in ..."

Or

"I'm sorry I don't know"

And for the vast majority of answers the first option looks more "probable", so it starts producing tokens with an affirmative answer, and if the model eventually sees a bunch of low probability answers when it tries to produce the year, it's already "too late" to backtrack in a naive GPT implementation.

You could train LLM such that it responds with "I'm sorry I don't know" more often, but how do you predicate the response on "do this only if your 500B parameters don't encode the answer"? It requires self-referential logic on the model which isn't obvious to me how it would be done.

Maybe some smart people have figured this out, but I can see how this makes it really hard to do.

1 comments

My understanding is that Backtracking isn't needed, sampling the network token at a time gives you the expected distribution over the token sequences too--

E.g. if you were to brute force expand out to depth "I'm sorry I don't know" and evaluate its probably relatively to all other strings you'd find that the probability of it is the same as you got sampling symbol at a time (though this isn't true if you do anything funny with your sampling).

The problem is actually that the distribution isn't the one you want, as it doesn't say I don't know enough. It's easy enough to graft on a beam search, just expand out every possibility, keep the best N and keep expanding them. But AFAIK it doesn't help.

Maybe this is less true for models which have been through RLHF, though.

Seems kinda tricky to train the right behavior here. Even if the input data contained "I don't know" (surely the internet doesn't, it's full of all us fking know it alls), it would contain I don't knows relative to the writer and not the model. So trying to push for it naively you just end up with models that say they don't know but when you ask them the same question in ROT13 they answer correctly. :P

Seems tricky for humans to learn too. Small children are fluent with english long before they're fluent in giving truthful responses. :)