Hacker News new | ask | show | jobs
by joe_the_user 623 days ago
Oh, thanks for the correction. I did misinterpret.

Illustrates language is hard for human too, hah.

Anyway, the "next iteration solves it" effect is definitely a result of common problems leaking. But it could also be a result of LLM being universal but not efficiently-universal problem solvers and people tending to choose the simplest problem that can't be solved (such theories seem illustrative).

Also, your river-crossing problems seem quite useful.

1 comments

  > hah
And? That's not what's the issue with LLMs.

The issue is an inability to reason. Sure, a human might also have difficulties with river crossing problems, even trivial ones, but I can't get a person to tell me that all animals can fit in the boat, to then put all the animals into the boat, and then proceed to make multiple trips across the river. If they get the first two they always get the right answer. But this is not true for an LLM. That's a very clear demonstration of a lack of reasoning and a lack of having a world model.

It's not about coaching or finding the right prompt, it's that the logic is inconsistent and unreasonable (yes, humans will fail at logic, but *reasoning doesn't mean correct answer*). It fails to meet the basic definition of reasoning.

The whole fucking goal is generalization. That's the G in AGI and the most important thing in all 3 of those letters. We don't have strong evidence of generalization. For GI we want out of distribution generalization but we're not doing so well at in distribution generalization. That's demonstrated by the river crossing puzzles, Cheryl's birthday, and the recently famous 9.8 vs 9.11 (https://x.com/sainingxie/status/1834300251324256439)

Yes, next iteration will get better. But better in which direction. Being dismissive of what it fails at just means you don't get better at that direction unless you get lucky.