If an LLM uses a calculator to come up with an answer, does it make it worse than a model that can inference the answer without using a tool?