| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mg 221 days ago

That makes me wonder if we could simply test this by letting the LLM add or multiply a long list of numbers?

Here is an experiment:

https://www.gnod.com/search/#q=%23%20Calcuate%20the%20below%...

The correct answer:

    Correct:    20,192,642.460942328

Here is what I got from different models on the first try:

    ChatGPT:    20,384,918.24
    Perplexity: 20,000,000
    Google:     25,167,098.4
    Mistral:    200,000,000
    Grok:       Timed out after 300s of thinking

4 comments

gcanyon 221 days ago

> Do not use a calculator. Do it in your head.

You wouldn't ask a human to do that, why would you ask an LLM to? I guess it's a way to test them, but it feels like the world record for backwards running: interesting, maybe, but not a good way to measure, like, anything about the individual involved.

link

throwuxiytayq 221 days ago

I’m starting to find it unreasonably funny how people always want language models to multiply numbers for some reason. Every god damn time. In every single HN thread. I think my sanity might be giving out.

link

solatic 221 days ago

A model, no, but an agent with a calculator tool?

Then there's the question of why not just build the calculator tool into the model?

link

KristoAI 221 days ago

Since grok 4 fast got this answer correct so quickly, I decided to test more.

Tested this on the new hidden model of ChatGPT called Polaris Alpha: Answer: $20,192,642.460942336$

Current gpt-5 medium reasoning says: After confirming my calculations, the final product (P) should be (20,192,642.460942336)

Claude Sonnet 4.5 says: “29,596,175.95 or roughly 29.6 million”

Claude haiku 4.5 says: ≈20,185,903

GLM 4.6 says: 20,171,523.725593136

I’m going to try out Grok 4 fast on some coding tasks at this point to see if it can create functions properly. Design help is still best on GPT-5 at this exact moment.

link

jarek83 221 days ago

Isn't that LLMs are not designed to do calculations?

link

cluckindan 221 days ago

They are not LMMs, after all…

Neither are humans.

But humans can still do it.

link