Hacker News new | ask | show | jobs
by wbarber 843 days ago
What's to say this isn't just a demonstration of memorization capabilities? For example, rephrasing the logic of the question or even just simple randomizing the order of the multiple choice answers to these questions often dramatically impacts performance. For example, every model in the Claude 3 family repeats the memorized solution to the lion, goat, wolf riddle regardless of how I modify the riddle.
2 comments

If the answers were Googleable, presumably smart humans with Internet access wouldn't do barely better than chance?
GPT-4 used to have the same issue with this puzzle early on but they've fixed since then (the fix was like mid 2023).
The fix is to train it on this puzzle and variants of it, meaning it memorized this pattern. It still fail similar puzzles if given in a different structure, until they feed it that structure as well.

LLMs is more like programming than human intelligence, they need to program in the solution to these riddles very much like we did expert systems in the past. The main new thing we get here is natural language compatibility, but other than that the programming seems to be the same or weaker than old programming of expert systems. The other big thing is that there is already a ton of solutions on the web coded in natural language, such as all the tutorials etc, so you get all of those programs for free.

But other than that these LLMs seems to have exactly the same problems and limitations and strengths as expert systems. They don't generalize in a flexible enough manner to solve problems like a human.