Hacker News new | ask | show | jobs
by shanewei 4 days ago
My understanding is that it’s mostly an inference-time knob, not different weights.

OpenAI describes reasoning.effort as controlling how many reasoning tokens get used before the answer. Anthropic’s docs are even more explicit that effort trades off thoroughness vs token efficiency “with a single model”.

So I wouldn’t read the Claude Code cache warning as proof that a different model is being used. It may just mean the thinking/effort setting is part of the cache key.