Why can't a next token predictor do math? Humans aren't calculators either, but we can do math.
If you want proof just look at the benchmarks. Modern frontier models can get basically perfect accuracy on American Invitational Mathematics Examination tests: https://matharena.ai/?comp=aime--aime_2026
If you want proof just look at the benchmarks. Modern frontier models can get basically perfect accuracy on American Invitational Mathematics Examination tests: https://matharena.ai/?comp=aime--aime_2026
If you want an explanation of how they do math, we've found geometric calculators inside their neural networks: https://www.goodfire.ai/research/a-geometric-calculator#