|
There's a few major problems with the article. The most obvious is that frontier labs are not charging remotely close to the cost of tokens; afaik most estimate north of 80% profit margins. As a reference, providers are profitably providing Kimi K2.6 for $4/1Mtok out. Is that as good as Opus? No, but it's probably at least Sonnet level, so that's ~4x cheaper than Sonnet while still being profitable to serve on the margin. So you aren't plausibly getting into actual subsidization territory until you're over 5:1 sub to nameplate token costs. How many tokens can you realistically burn through in one chat session? Opus and many other frontier models do maybe 60tok/s, less 250k/hr out. In you can use more, but in most cases cache is 5-10:1 cheaper than new input. Say you average 500ktok in, 90% cache, per request. That amounts to 100-150ktok in new input-equivalent costs, which in most cases is ~20-30ktok in output-equivalent costs. Do a request every minute, that's a total of about 1.5-2Mtok/hr. At API prices that's $50/hr for Opus, but really it probably only costs Anthropic $10/hr to serve that. That said, even if a developer is burning $50/hr, many, many employees at large companies cost more than $100k/yr to employ all costs considered, so making them say 20-30% more productive can easily make that worth it for most. If the labs shave their margins ultimately to more like 20-30%, you'd have ~$15/hr in costs to use the services, and nearly every white collar job is way over 30k/yr to employ. If your salary is 80k, you probably cost the company 200k all in, so making you 15% more productive offsets the $15/hr cost. So first party providers are not in a horrifying position or anything from a subsidization standpoint. The people in bad shape are Cursor and Perplexity, who don't have frontier models and are dependent on the open source community, which is typicly 6-12 months behind the frontier. They have to pay full freight API costs at 80% margin for the big boys to serve their harnesses, which is indeed untenable, and they'll have to either force users to use open source models and/or in house models they can serve at-cost or they will have to charge vastly more. Gemini, Claude, and ChatGPT first-party services like Antigravity, Codex, and Claude Code are not in serious trouble though. |
This all becomes extremely visible when trying to do agentic coding with local language models - you quickly realize that controlling context length and model size is just as important as avoiding wasted effort. The real scam is not AI Q&A ala ChatGPT, that's actually quite viable - though marginally less so as conversations grow longer. It's agentic coding with SOTA models and huge contexts.