| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simianwords 118 days ago

I don't think finding the worst model and asking it to do things is the best way to go.

GPT-5 pro without tools can easily solve your question and much harder ones.

Rather: does there exist a model that can perform these calculations reliably is a better way to falsify this claim.

Else we can always find the worst model with 1B parameters to falsify any claim made on LLM's.

3 comments

AlexeyBrin 118 days ago

How do you know GPT-5 does not call a Python interpreter remotely on OpenAI servers when you ask it to do arithmetic ? Your prompt goes to their servers, you have no way to know what happens there.

The only way to be sure a model calls no tool is to run it locally and control the network.

link

simianwords 118 days ago

It’s documented that pro version in api can’t run interpreter. Do you not believe the documentation?

link

floam 118 days ago

They don’t give the customer access to it, for probably an obvious reason.

link

chickenimprint 118 days ago

I chose Mistral because it's a large SOTA model that allows me to disable tool calling.

If you try using ChatGPT, you'll see it goes into reasoning mode (It flashes the text "Working") and takes a while when asked to do arithmetic. Chatbots that don't forbid the user from viewing the reasoning chain of thought, such as Deepseek or Mistral, reveal that these chatbots work exactly as I described:

====

Thought for 1s

Thinking

Okay, the user wants to multiply two large numbers: 63,157,997,633 and 6,311,490,009. Since these numbers are quite large, I should use the code interpreter to handle this calculation accurately. Let's write a simple piece of code to compute this product.

Code

output

generated

python

# Calculating the product of the two numbers

number1 = 63157997633

number2 = 6311490009

product = number1 * number2

product

ResultCopy

398621071049125148697

Thinking

Great, the code interpreter has given us the product of the two numbers. Now, let's present the result to the user.

====

Falsifying my claim is simple: Find a model that verifiably does basic arithmetic correctly without relying on any external tools.

link

kgeist 118 days ago

I run Qwen3-32b locally without any tools (just llama.cpp) and it can do basic arithmetic for smaller numbers ( like 134566) but I didn't check it for much larger numbers. I'm not at the PC right now but trying to do it via OpenRouter on much larger numbers overflows the context and it stops without giving a result :)

link

simianwords 118 days ago

Gpt 5 pro in API does not support code interpreter tool. Is this enough?

link

throw310822 118 days ago

Can you do basic arithmetic correctly without relying on external tools?

link

bdangubic 118 days ago

I can, since roughly first grade. but I cost significantly North of $200/month

link

throw310822 118 days ago

Then please multiply 13584638263947303 by 259472845392638 without using any tools (that is, in your head). Get back to me when you're done.

link

bdangubic 118 days ago

without “tools” easy, I have pen and paper and first grade math :)

link

jakogut 118 days ago

I think the point of the line of questioning is to illustrate that "tools" like a code interpreter act as scratch space for models to do work in, because the reasoning/thinking process has limitations much like our own.

link

jdlshore 118 days ago

Enough with the whataboutism. The topic is what LLMs are capable of, not what humans are capable of.

link

rybosworld 118 days ago

> GPT-5 pro without tools can easily solve your question and much harder ones.

How are you able to use GPT-5 with tools turned off? Do you mean external tools (like searching the web)?

My understanding is that GPT models always have access to python, and it isn't something you can turn off.

link

simianwords 118 days ago

What if we use the use the api? You can explicitly disable tool class. Is that enough?

link