|
I think you miss 2 big aspects: 1. High volume providers get efficiencies that low volume do not. It comes from both more workload giving more optimization opportunities, and staffing to do better engineering to begin with. The result is break even for lower volume firms is profitable for higher volume, and as high volume is magnitudes more scale, this quickly pays for many people. By being the high-volume API, this game can be played. If they choose not to bother, it is likely because strategic views on opportunity cost, not inability. That's not even the interesting analysis, which is what the real stock value is, or whatever corp structure scheme they're doing nowadays: 2. Growth for growths sake. Uber was exactly this kind of growth-at-all-costs play, going more into debt with every customer and fundraise. My understanding is they were able to tame costs and find side businesses (delivery, ...), with the threat becoming more about category shift of self-driving. By having the channel, they could be the one to monetize as that got figured out better. Whether tokens or something else becomes what is charged for at the profit layers (with breakeven tokens as cost of business), or subsidization ends and competitive pricing dominates, being the user interface to chat and the API interface to devs gives them channel. Historically, it is a lot of hubris to believe channel is worthless, and especially in an era of fast cloning. |
But paid-per-token APIs at negative margins do not provide scaling efficiencies! It's just the provider giving away a scarce resource (compute) for nothing tangible in exchange. Whatever you're able to do with that extra scale, you would have been able to do even better if you hadn't served this traffic.
In contrast, the other things you can use the compute for have a real upside for some part of the genai improvement flywheel:
1. Compute spent on free users gives you training data, allowing the models to be improved faster.
2. Compute spent on training allows the models to be trained, distilled and fine-tuned faster. (Could be e.g. via longer training runs or by being able to run more experiments.)
3. Compute spent on paid inference with positive margins gives you more financial resources to invest.
Why would you intentionally spend your scarce compute on unprofitable inference loads rather than the other three options?
> 2. Growth for growths sake.
That's fair! It could in theory be a "sell $2 for $1" scenario from the frontier labs that are just trying to pump up their revenue numbers to fund-raise from dumb money who don't think to at least check on the unit economics. OpenAI's latest round certainly seemed to be coming from the dumbest money in the world, which would support that.
I have two rebuttals:
First, it doesn't explain Google, who a) aren't trying to raise money, b) aren't breaking out genai revenue in their financials, so pumping up those revenue numbers would not help at all. (We don't even know how much of that revenue is reported under Cloud vs. Services, though I'd note that the margins have been improving for both of those segments.)
Second, I feel that this hypothetical, even if plausible, is trumped by Deepseek publishing their inference cost structure. The margins they claim for the paid traffic are high by any standard, and they're usually one of the cheaper options at their quality level.