If being probabilistic prevented learning deterministic functions, transformers couldn’t learn addition either. But they can, so that can't be the reason.
Are you sure? I bet you if you pull 10 people off the street and ask them to multiply 5 digit by 5 digit numbers by hand, you won't have a 100% success rate.