| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjburgess 1096 days ago

> Then they don't in LLMs too

LLMs don't get drunk .

If a child answers questions from a book of answers then they'll appear to understand the domain insofar as those questions appear. They do not.

They will fail to answer questions under, eg., permutations of words (say, a question asks about "norepinephrine" but the book only contains "noradrenaline" etc.).

Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

But these are not the kinds of failures experienced with those who have some capacity, eg., for counter-factual reasoning about their environment's physics.

In those people it is environmental illusion and cognitive impairment -- not trivial permutations of phrasing which lead to catastrophic loss of apparent understanding.

Cognitive impairment = reasoning machine is broken

Environmental illusion = data is ambigious and actions cannto resolve it

These "failure modes" are expected if you actually have the relevant capacity.

2 comments

moffkalast 1096 days ago

> LLMs don't get drunk .

Well actually they sort of can...

https://www.reddit.com/r/LocalLLaMA/comments/13vv941/tempera...

link

famouswaffles 1096 days ago

>Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

alright let me humor you for a bit. Lets start with some solid examples of GPT-4 failing this "trivial linguistic permutation" then ?

link

mjburgess 1096 days ago

see, just one reference in the paper: https://arxiv.org/pdf/2302.08399.pdf

link

famouswaffles 1096 days ago

they can answer those

https://medium.com/@nathanbos/prompting-better-theory-of-min...

link

mjburgess 1096 days ago

Yes, by changing the words

The whole point is that irrelevant word permutation should not "turn on" or "turn off" this capacity.

That you can "prompt engineer" your way to the answer shows that the prompt engineer knows the answer and can "use the right search terms" to find it.

link

famouswaffles 1096 days ago

"But the bag is transparent" is not "irrelevant word permutation" and neither is the additive question that spurs the correct resolution. And it certainly isn't random.

a human that isn't paying attention could fail the question too which is kind of the point i'm making.

There's no way a model that can't model protein structures does this - https://www.researchgate.net/publication/367453911_Large_lan...

link

YeGoblynQueenne 1096 days ago

>> alright let me humor you for a bit.

That's real class, right there.

link