Hacker News new | ask | show | jobs
by emmender 1073 days ago
failed all the logic puzzles with slight tweaks - including stupid monty hall (with transparent doors). BSs with confidence. agi is not knocking at the door.
1 comments

Can you share a few of those?
prove that there are no non negative numbers less than 3

bullshits an answer with confidence (all llms do this)

stupid monty hall

Suppose you're on a game show, and you're given the choice of three transparent doors...

stupid river crossing

A farmer with a wolf, a goat, and a koala must cross a river by boat....

basically, these LLMs have ingested canned solutions and cant reason with newly defined concepts. Anything "out-of-the-box" and they BS canned answers - like the rote student. The BS is particularly distasteful because of the confidence projected in the answer...

So, they are great for looking-up commonly understood "in-the-box" narratives, but are poor at reasoning where there is some novelty. this is what we can expect from a probabilistic "deep" autocompleting machine. unlike a child which can learn ideas and metaphors from a few examples and anomalies.

You are expecting these models to do something that not even their creators claim they can do. Of course they will fail at it.
disagree, their creators are hyping these things to no end - to get their next rounds of funding.
change the terms so it doesn't look the puzzles in its memory and GPT-4 can answer some of these. Reasoning is fine.
how can you say reasoning is fine - when it fails at basic logic.. ?

we need to coax-it with the right prompts for it to come up with an answer - so, basically it cant reason.

looks like you have an incentive to ignore what you see.

Seeing a problem you've seen many times and have memorized and plowing through it without "concentrating" enough to see the subtle differences is a failure mode that occurs in humans as well. We don't say "humans can't reason" just because this happens so it makes little sense to say the same for LLMs. The important bit is that it can solve it if nudged from memory, same as people.
Humans are wired fundamentally to be irrational - our perceptual/cognitive apparatus is deeply flawed - umpteen studies show this - so this is a given.

But, we also discovered a way to think/model which seems to work amazingly - which is the scientific method or reasoning. But this language is not natural to the way humans operate at all. It is a struggle for most of us to think in that manner. thats why math/science is difficult for most of us, and these were discovered only in the last 2000 years.

LLMs cannot yet represent conceptual relationships deterministically/symbolically. At some point in the future, perhaps they can, but the current generation has a long way to go.