Hacker News new | ask | show | jobs
by mjburgess 1096 days ago
> Then they don't in LLMs too

LLMs don't get drunk .

If a child answers questions from a book of answers then they'll appear to understand the domain insofar as those questions appear. They do not.

They will fail to answer questions under, eg., permutations of words (say, a question asks about "norepinephrine" but the book only contains "noradrenaline" etc.).

Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

But these are not the kinds of failures experienced with those who have some capacity, eg., for counter-factual reasoning about their environment's physics.

In those people it is environmental illusion and cognitive impairment -- not trivial permutations of phrasing which lead to catastrophic loss of apparent understanding.

Cognitive impairment = reasoning machine is broken

Environmental illusion = data is ambigious and actions cannto resolve it

These "failure modes" are expected if you actually have the relevant capacity.

2 comments

> LLMs don't get drunk .

Well actually they sort of can...

https://www.reddit.com/r/LocalLLaMA/comments/13vv941/tempera...

>Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

alright let me humor you for a bit. Lets start with some solid examples of GPT-4 failing this "trivial linguistic permutation" then ?

see, just one reference in the paper: https://arxiv.org/pdf/2302.08399.pdf
Yes, by changing the words

The whole point is that irrelevant word permutation should not "turn on" or "turn off" this capacity.

That you can "prompt engineer" your way to the answer shows that the prompt engineer knows the answer and can "use the right search terms" to find it.

"But the bag is transparent" is not "irrelevant word permutation" and neither is the additive question that spurs the correct resolution. And it certainly isn't random.

a human that isn't paying attention could fail the question too which is kind of the point i'm making.

There's no way a model that can't model protein structures does this - https://www.researchgate.net/publication/367453911_Large_lan...

>> alright let me humor you for a bit.

That's real class, right there.