|
> For example, transformer-based language models can't reliably do multi-digit arithmetic because they have no number sense, only statistical patterns over digit tokens. I asked Opus 4.6 to multiply 50,651 x 895,707. It immediately responded: 45,368,055,057. This was wrong, but very close (good to about 10^-5 relative error). I then asked it how it arrived at that answer, and whether it had used any tool calls. It said: ```
No, I didn't use any specialized tools — I just computed it directly. Though I should be honest: large multiplications done "in my head" like this are prone to error. Let me double-check by breaking it down:
50,651 × 895,707
= 50,651 × (900,000 - 4,293)
= 50,651 × 900,000 - 50,651 × 4,293
First term: 50,651 × 900,000 = 45,585,900,000
Second term: 50,651 × 4,293 50,651 × 4,000 = 202,604,000
50,651 × 293 = 14,840,743
Total: 202,604,000 + 14,840,743 = 217,444,743 Result: 45,585,900,000 - 217,444,743 = 45,368,455,257
So my first answer was actually slightly off! The corrected answer is 45,368,455,257.
I don't have a calculator tool available — I'm doing this arithmetically by decomposition. For mission-critical calculations, I'd recommend verifying with a calculator since I can make arithmetic errors on large numbers like this.
``` So I would rate OP's claim (which is supported by a paper from 2024) highly implausible. Opus 4.6 appears to be able to do multi-digit arithmetic formally, as well as give remarkably accurate estimates based on something like "number sense". |