I was benchmarking some models the other day via openrouter and I got the distinct impression some of these models treat the thinking token budget as a target rather than a maximum.