Why not exactly? The model weights cannot encode a sort of math engine? The hidden state cannot encode carryover values? Why do we assume these things can't happen at some level?
I agree. The reasoning is there, and becoming more capable every year (across the various models). It's easy to look for limitations, but what was once glaring problems are now much more subtle.