Hacker News new | ask | show | jobs
by watutalkinbout 5 days ago
Algorithms and data that emulate responses aren't smart.

A 5 year old knows if you want to wash your car, you need to take it to the car wash.

2 comments

Can a 5 year old write a substantial program on spec, that passes the requirements and given tests, in a few minutes?

If not, then perhaps this comparison is not the be all end all.

"A ship is useless, it can't drive over land..."

5 year olds and ai both have jagged intelligence.

also its AI not "artificial code generation intelligence" . Ship is your view of the product to shoehorn into something specific.

But it demonstrates that LLMs struggle with basic reasoning. A criticism of LLMs is that they're imitating without a understanding of what they're doing and without a clear plan, so this inability to solve a simple logic puzzle is very relevant. If LLMs didn't struggle with reasoning problems then something like ARC-AGI wouldn't exist.
Its a question designed to fool the AI. It's like saying that a person doesn't understand the limitations of reality when they fall for a magic trick.
These aren't the same thing.

You can't fool an AI because it isn't using its own judgement.

You can fool an AI as evidenced by them being fooled. They demonstrably appear to be working through a problem and get fooled by the wording of the problem. If you think differently, merely asserting it is not the way to convince people that what they see is wrong.
You can't fool an inanimate object.

An LLM is (effectively) just a really, really elaborate "choose your own adventure" book.

It's not "working through" problems, it's just tracing a route through an pre-defined information space. It's not actually thinking, it just does a good impression of it.

You're describing a tool.

Tools can do very useful things, but they aren't intelligent.

referring to John Woolridge's recent talk? "the car wash is only 200 feet away, should I drive or walk?"

His slide showed Opus 4.6 saying "walk". I couldn't get 4.6 to do that.