Okay, but we have since invented machines that can do arithmetic correctly, every time. When we try to do maths via an LLM, we're just throwing all of that away.
So ? I didn't tell you to use GPT-4 for arithmetic over a calculator. I simply pointed out that the only standard where GPT-4 is not good at arithmetic is a standard humans wouldn't fit the bill either. Especially since zero shot "mental" arithmetic is not even close to GPT-4 at its most accurate.
The discussion started "what would it take to convince people that [insert favourite LLM] is good at maths", and the response to that IMHO is that we have much better tools to do arithmetic (I don't even want to say maths), even if humans themselves are also poor at arithmetic.
What's the point of building a system to be equally bad as humans at something that we know humans are bad at? LLMs have their uses but (at least at the current stage) performing arithmetic calculations is not one of them (to say nothing of more advanced mathematics).