Hacker News new | ask | show | jobs
by gjm11 302 days ago
I have a PhD, in mathematics, from a top university. If you give me, say, 100 10-digit numbers to add up and tell me to do the job in my head then I will probably get the answer wrong.

Of course, if you give me 100 10-digit numbers to add up and let me use a calculator, or pencil and paper, then I will probably get it right.

Same for, say, two 100-digit numbers. (I can probably get that one right without tools if you obligingly print them monospaced and put one of them immediately above the other where I can look at them.)

Anyway, the premise here seems to be simply false. I just gave ChatGPT and Claude (free versions of both; ChatGPT5, whatever specific model it routed my query to, and Sonnet 4) a list of 100 random 10-digit numbers to add up, with a prompt encouraging them to be careful about it but nothing beyond that (e.g., no specific strategies or tools to use), and both of them got the right total. Then I did the same with two 100-digit numbers and both of them got that right too.

1 comments

https://i.imgur.com/l2elIAv.png

Difficulty is the amount of digits, small models struggle with 10 digits numbers, gemini and gpt-5 are very good recent models, gemini start failing before 40 digits, GPT-5 (the one by api, the online chat version is worse and I didn't tested it) can do more than 120 digits (at this point it's pointless to test for more).

My tests of GPT-5 were using the online chat version.

Of course, I only ran it once; I can't at all rule out the possibility that sometimes it gets it wrong. But, again, the same is true of humans.

The online version is way worse, it also have a router that could route it to a random model.