| Claude and ChatGPT have thinking efforts where you can tune the amount of thinking allowed. Like low, medium, high, xhigh and so on. But are they different models underneath? Or same model with different parameter? The reason I ask is because, if I change the effort param mid conversation in Claude code, I get a warning suggesting I’m breaking the cache. I don’t think this happens in Codex because when I change the effort, the responses are still quick. |
The "amount of thinking" is how long this internal conversation is allowed to progress. The longer it goes on the more it costs. It's all part of the token budget but, because this internal dialogue is hidden, it's not obvious to the end user.