| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by m348e912 826 days ago
	The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out. My question is why can't LLMs included a sub-routine to check itself before answering. Simply asking itself something like "this answer may not be correct, are you sure you're right?"

7 comments

Shrezzing 826 days ago

>The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out.

From what I've tested, all of the current models will see a prompt like "are you sure that's correct" and respond "no, I was incorrect [here's some other answer]", irrespective of the accuracy of the original statement.

link

greenavocado 826 days ago

In my experience the corrections can be additional hallucinations one after another after pointing out inaccuracies even multiple times in a row.

link

Eisenstein 825 days ago

> My question is why can't LLMs included a sub-routine to check itself before answering.

Because LLMs don't work in a way for that to be possible if you operate them on their own.

Here is the debug output of my local instance of Mistral-Instruct 8x7B. The prompt from me was 'What is poop spelled backwards?'. It answered 'puoP'. Let's see how it got there starting with it processing my prompt into tokens:

   'What (3195)', ' is (349)', ' po (1627)', 'op (410)', ' sp (668)', 'elled (6099)', ' backwards (24324)', '? (28804)', '\n (13)', '### (27332)', ' Response (12107)', ': (28747)', '\n (13)',

It tokenized 'poop' as two tokens: 'po', number 1627, and 'op', number 410.

Next it comes up with its response:

   Generating (1 / 512 tokens) [(pu 4.43%) (The 66.62%) (po 11.96%) (p 4.99%)]
   Generating (2 / 512 tokens) [(o 89.90%) (op 10.10%)]
   Generating (3 / 512 tokens) [(P 100.00%)]
   Generating (4 / 512 tokens) [( 100.00%)]

It picked 'pu' even though it was only a ~4% chance of being correct, then instead of picking 'op' it picked 'o'. The last token was a 100% probability of being 'P'.

   Output: puoP

At no time did it write 'puoP' as a complete word nor does it know what 'puoP' is. It has no way of evaluating whether that is the right answer or not. You would need a different process to do that.

link

ZitchDog 826 days ago

The problem is that if you call it out, it will frequently change its answer, even if it was correct. LLMs currently lack chutzpa.

link

samus 825 days ago

They definitely stand their ground if they were aligned to do so.

link

Drakim 825 days ago

But then they stand their ground when wrong too.

link

Jensson 826 days ago

That is a common bullshitting strategy, talk a lot of bullshit, and then backtrack and acknowledge you were wrong when people push back. That way they will think you know way more than you do. Many people will see thought that, but most will just think you are a humble expert who can acknowledge when you are wrong instead of you always acknowledging you are wrong even when you aren't.

People have a really hard time catching such bullshitting from humans, which is why free form interviews doesn't work.

link

asimovfan 826 days ago

Its because theres no entity that is actually acknowledging anything. Its generating an answer to your prompt. You can gaslight it into anything being wrong or correct.

link

samus 825 days ago

They simply don't work that way. You are asking it for an answer, it will give you one since all it can do is extrapolate from its training data.

Good prompting and certain adjustment to the text generation parameters might help prevent hallucinations, but it's not an exact science since it depends on how it was trained. Also, an LLMs training data frankly said contains a lot of bulls*t.

link