Hacker News new | ask | show | jobs
by famouswaffles 1097 days ago
>The "failure modes" in humans do not show we lack the capacity.

Then they don't in LLMs too

>Yes, lol --- why do you think that is?

Being able to solve a changed common puzzle but also with different names than it would ever see in training is not an indication of a lack of ability lol. and changing names isn't the only way to get it out of memory, just the easiest/most straightforward. You can converse it out of there too but that doesn't work as often.

1 comments

> Then they don't in LLMs too

LLMs don't get drunk .

If a child answers questions from a book of answers then they'll appear to understand the domain insofar as those questions appear. They do not.

They will fail to answer questions under, eg., permutations of words (say, a question asks about "norepinephrine" but the book only contains "noradrenaline" etc.).

Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

But these are not the kinds of failures experienced with those who have some capacity, eg., for counter-factual reasoning about their environment's physics.

In those people it is environmental illusion and cognitive impairment -- not trivial permutations of phrasing which lead to catastrophic loss of apparent understanding.

Cognitive impairment = reasoning machine is broken

Environmental illusion = data is ambigious and actions cannto resolve it

These "failure modes" are expected if you actually have the relevant capacity.

> LLMs don't get drunk .

Well actually they sort of can...

https://www.reddit.com/r/LocalLLaMA/comments/13vv941/tempera...

>Insofar as a human cannot answer questions under trivial linguistic permutations then they too do not understand the domain.

alright let me humor you for a bit. Lets start with some solid examples of GPT-4 failing this "trivial linguistic permutation" then ?

see, just one reference in the paper: https://arxiv.org/pdf/2302.08399.pdf
Yes, by changing the words

The whole point is that irrelevant word permutation should not "turn on" or "turn off" this capacity.

That you can "prompt engineer" your way to the answer shows that the prompt engineer knows the answer and can "use the right search terms" to find it.

"But the bag is transparent" is not "irrelevant word permutation" and neither is the additive question that spurs the correct resolution. And it certainly isn't random.

a human that isn't paying attention could fail the question too which is kind of the point i'm making.

There's no way a model that can't model protein structures does this - https://www.researchgate.net/publication/367453911_Large_lan...

>> alright let me humor you for a bit.

That's real class, right there.