Hacker News new | ask | show | jobs
by nicklecompte 905 days ago
> Also how do you prove that GPT is worse at counting?

Back in June 2023 GPT-4 was dramatically worse at counting than a pigeon in the sense that it couldn't accurately tell the difference between sentences with 3 words and sentences with 5 words, whereas pigeons can count almost anything up to about 10. It also routinely failed "pick the shorter sentence" tests which I literally took from a test administered to mice. GPT simply doesn't understand what numbers are, whereas pigeons and mice have an intuitive understanding similar to toddlers. You don't need to teach kids what 3 means, you just need to teach them the human symbol for the concept of 3. GPT only has the human symbol and does not seem capable of understanding the concept.

In my testing GPT-4 consistently failed counting / pattern-recognition tests even if you used "chain-of-thought" prompting. As far as I could tell its only true understanding of numbers was "one, two, many." This seems reflected in real use cases, where GPT routinely (and hilariously) ignores commands to return 50 words/etc of output. GPT doesn't know what fifty means, it just knows what various documents that say "word count: 50" look like, and tries to imitate the tone.

Since transformer neural networks lack recursion I conjecture that GPT will never be able to understand a number larger than 2, even if in specific cases it can solve counting problems up to eleventy billion. This is what I mean by "counting apples, not oranges," its sense of counting is paper-thin and easily fooled by adversarial prompts. It is much harder to fool a mouse or a pigeon.

Many of the tests I ran back in April 2023 no longer work. I strongly suspect this is because OpenAI trained GPT to many of the tests that people were throwing at it, and not because GPT actually became "smarter." I stopped messing around with GPT specifically because OpenAI doesn't issue any release notes, making replicability impossible. Mistrial's 77B model was dramatically worse than even GPT-3 at counting, but I doubt they trained it to count. Not sure about LLaMa/etc.

3 comments

When you are talking about "counting", do you mean the logical process of going "one", "two", "three"... or do you mean the ability to statistically estimate the amount of quantity by the amount of signal you are processing?

E.g. are pigeons actually "counting" as in the process how humans calculate to be accurate? Or are they just responding to the signal? Like similar to how a person could tell whether some sound is higher or lower pitch, but they wouldn't be able to actually numerically say the actual exact frequency.

Because to me pigeons are just similarly responding to the amount of "signal" they are receiving, not actually doing abstract reasoning.

And looking at the science studies, it also seems that they had to train pigeons to be able to count, they weren't able to do it out of the box.

But by the way, when you are criticising GPT's ability to count words in the sentences you are saying, that is quite odd to me. Because the input that GPT receives is actually tokens, not the words you give it.

So then imagine if someone asked you a question in English, and then translated it to hieroglyphs, and you didn't know English. Would you be able to count how many words were there in the original English?

So it seems weird to expect that GPT would be able to count in the first place.

But however if it later was taught how many words the combination of different tokens yielded to, it would be able to do that. So perhaps this is what was taught to it meanwhile yielding in that better ability to count words?

Thirdly GPT with Vision can count objects on an image very well, doesn't matter what the objects specifically are. Does it make mistakes? Sometimes, when objects are not clearly visible, but so would humans and pigeons.