|
|
|
|
|
by Filligree
637 days ago
|
|
There's no specific reason why LLMs couldn't be trained to say "Don't know" when they don't know. Indeed, some close examination shows separate calculation patterns when it's telling the truth, when it's making a mistake and when it's deliberately bullshitting, with the latter being painfully common. The problem is we don't train them that way. They're trained on what data is on the internet, and people... people really aren't good at saying "I don't know". Applying RLHF on top of that at least helps reduce the deliberate lies, but it isn't normal to give a thumbs-up to an "I don't know" response either. ... Of course, all this stuff does seem fixable. |
|
Yes there is, it's that we don't know how. We don't have anywhere close to the level of understanding to know when an LLM knows something and when it doesn't.
Training on material that includes "I don't know" will not work. That's not the solution.
If we knew how, we'd be doing it, since that's the #1 user complaint, and the company that fixed it would win.