| > LLMs are PhD-level reasoners in math and science, yet they fail at children's puzzles. How is this possible? Because they are not. Pattern matching questions on a contrived test is not the same thing as understanding or reasoning. It’s the same reason why most of the people who pass your leetcode tests don’t actually know how to build anything real. They are taught to the test not taught to reality. |
Do submarines swim? I don't really care if it gets me where I want to go. The fact is that just two days ago, I asked Claude to look at some reasonably complicated concurrent code to which I had added a new feature, and asked it to list what tests needed to be added; and then when I asked GPT-5 to add them, it one-shot nailed the implementations. I've written a gist of it here:
https://gitlab.com/-/snippets/4889253
Seriously just even read the description of the test it's trying to write.
In order to one-shot that code, it had to understand:
- How the cache was supposed to work
- How conceptually to set up the scenario described
- How to assemble golang's concurrency primitives (channels, goroutines, and waitgroups), in the correct order, to achieve the goal.
Did it have a library of concurrency testing patterns in its head? Probably -- so do I. Had it ever seen my exact package before in its training? Never.
I just don't see how you can argue with a straight face that this is "pattern matching". If that's pattern matching, then pattern matching is not an insult.
If anything, the examples in this article are the opposite. Take the second example, which is basically 'assemble these assorted pieces into a rectangle'. Nearly every adult has assembled a minimum of dozens of things in their lives; many have assembled thousands of things. So it's humans in this case who are simply "pattern matching questions on a contrived test", and the LLMs, which almost certainly didn't have a lot of "assemble these items" in their training data, that are reasoning out what's going on from first principles.