Hacker News new | ask | show | jobs
by simianwords 201 days ago
I like the central point of this article which is top to bottom vs bottom to top thinking.

But I wonder if there is a falsifiable, formal definition to suggest that models (or anything for that matter) _do_ think.

The normal reply to chatgpt getting a question right is that it simply extrapolated what was already in the training data set. But I feel like the degree to which something "thinks" is the ability to generalise what it already knows.

This generalisation needs some formality - maybe some mathematical notation (like the opposite of overfitting). By generalisation I mean the ability to get something correct that exists pretty far from the training data.

The reason I suggest this is because GPT can solve pretty much any high school math problem you throw at it and it can do it better than 99% of humans. This is clearly not just memorising training data but doing something more. If it were not generalising, it couldn't possibly solve all new high school level mathematics.

But the extent decreases as you go higher level into undergraduate mathematics where it can still solve most problems you throw at it but not all. And still lower in PhD level mathematics. So the "thinking" ability of GPT exists somewhere in between - in some spectrum. But I don't think you can directly say that it can never generalise PhD level mathematics.. it could do it for high school so why not PhD?

If hypothetically it can solve PhD level mathematics, would people still claim that LLM's don't think?