The tokenizer is an important part of overall model training and performance. It’s only one piece of the overall cost per request. If a tokenizer that produces more tokens also leads to a model that gets to the correct answer more quickly and requires fewer re-prompts because it didn’t give the right answer, the overall cost can still be lower.
Comparisons are still ongoing but I have already seen some that suggest that Opus 4.7 might on average arrive at the answer with fewer tokens spent, even with the additional tokenizer overhead.
How would it be a money grab? If the new tokenizer requires more tokens to encode the same information, it costs them more money for inference. The point of charging per token is that the cost is proportional to the number of tokens. That's my understanding anyway
If model provider believes they have a better model, it can be a viable bet. But many (me included) started experimenting with other providers because of enshittification from Anthropic (price + uptime). Only to find, that Codex is not that worse in quality for a significantly more output per $.
There are no conspiracies where a corporation has profit incentive. There is perhaps a question of planning and initial intentionality, but the metrics and motivation to continue are clear enough.
Not necessarily with speculative decoding. Whitespace would be trivial to predict and they would petty much keep using the same amount of compute as before.
I don't think that's their primary motive for doing this but it is a side effect.
If they wanted they could always just double the $/token. They don't seem to be able to keep up with their current demand and that's what companies normally do in that circumstance if they're looking to money grab, no need for the bankshot approach.
Comparisons are still ongoing but I have already seen some that suggest that Opus 4.7 might on average arrive at the answer with fewer tokens spent, even with the additional tokenizer overhead.
So, no, not a money grab.