Why can't a next token predictor do math? Humans aren't calculators either, but we can do math.
If you want proof just look at the benchmarks. Modern frontier models can get basically perfect accuracy on American Invitational Mathematics Examination tests: https://matharena.ai/?comp=aime--aime_2026
But LLM can write code that can do math and count. Tool use, more broadly, has proven to be a very powerful way to let LLMs do what they're good at (handle the fuzzy and imprecise nuances of natural language, which includes the scooping of a lot of context) and delegate other things they're not good at to external tools, some of which if can write on the spot.
If you think about it, we humans do that all the time too.
I'm crap at 4 digit multiplication in my head, but I have no problem doing that with pencil and paper
> But LLM can write code that can do math and count.
They cannot, however, execute that code. They can feed that code into an external program they've been given access to, but they can't execute it themselves.
You presumably have no problem moving around in a car that you only control indirectly via a steering wheel, an accelerator and a brake pedal without ever actually powering the wheels