| HN Mirror

The results from playing with this are really bizarre: (sorry, formatting hacked up a bit)

To calculate 7^1.83 , you can use a scientific calculator or an exponentiation function in programming or math software. Here is the step-by-step calculation using a scientific calculator:

Input the base: 7 Use the exponentiation function (usually labeled as ^ or x^y). Input the exponent: 1.83 Compute the result. Using these steps, you get:

7^1.83 ≈ 57.864

So, 7^1.83 ≈ 57.864

Given this, and the recent announcement of data analysis features, I’m guessing the GPT-4o is wired up to use various tools, one of which is a calculator. Except that, if you ask it, it also blatantly lies about how it’s using a calculator, and it also sometimes makes up answers (e.g. 57.864 — that’s off by quite a bit).

I imagine some trickery in which the LLM has been trained to output math in some format that the front end can pretty-print, but that there’s an intermediate system that tries (and doesn’t always succeed) to recognize things like “expression =” and emits the tokens for the correct value into the response stream. When it works, great — the LLM magically has correct arithmetic in its output! And when it fails, the LLM cheerfully hallucinates.