|
|
|
|
|
by mike_hearn
889 days ago
|
|
That's interesting but regardless of the underlying cause the effect is deception by any standard: it knows a thing exists, and yet it claims no such thing exists when asked. We don't let humans off the hook if they deceive us but there's an explanation for it. If it's genuinely due to a weird reasoning failure rather than some social bias it's picked up then that's hopeful, as that would make it in principle fixable. |
|
Yes, we do. If I ask you to sing the alphabet song backwards instantly, and you have to pause for a few seconds and think and rehearse the alphabet mentally, we don't say 'ah, Mike just tried to deceive us! He refused to answer, pretending he didn't know, but eventually admitted he could sing it backwards after all'. Similarly, I can't imagine why a LLM would 'deceive' us about Tom Cruise's mother. It's simpler to just say that LLMs are not logically omniscient any more than humans are, and they just seem to be worse in this specific example.
> as that would make it in principle fixable.
Yes, see the discussion there: most people think it could be fixed by training with reversed text (or possibly the equivalent, bidirectional losses), but AFAIK it has not been done yet.