Hacker News new | ask | show | jobs
by princealiiiii 383 days ago
Any app built on top of these model providers could become a competitor to these providers. Since the model providers are currently in the lowest-margin part of the business, it is likely they will try to expand in to the app layer and start pulling the rug from under these businesses built on top of them.

Amazon had a similar tactic, where it would use other sellers on its marketplace to validate market demand for products, and then produce its own cheap copies of the successes.

4 comments

The model providers are not in the low margin part of the business. The unit economies of paid-per-token APIs are clearly favorable, and scale amazingly well as long as you can procure enough compute.

I think it's the subscription-based models that are tricky to make work in the long term, since they suffer from adverse selection. Only the heaviest users will pay for a subscription, and those are the users that you either lose money on or make unhappy with strict usage limits. It's kind of the inverse of the gym membership model.

Honestly, I think the subscriptions are mainly used as a demand moderation method for advanced features.

> The model providers are not in the low margin part of the business.

Many people believe that model providers are running at negative margin.

(I don't know how true it is.)

Yes, many people believe that, but it doesn't seem to be an evidence-based belief. I've written about this in some detail[0][1] before. But since just linking to one's own writing is a bit gauche and doesn't make for a good discussion, I'll summarize :)

1. There is no point in providing paid APIs at negative margins, since there's no platform power in having a larger paid API share (paid access can't be used for training data, no lock-in effects, no network effects, no customer loyalty, no pricing power on the supply side since Nvidia doesn't give preferential treatment to large customers). Even selling access at break-even makes no sense, since that is just compute you're not using for training, or not selling to other companies desperate for compute.

2. There are 3rd-party providers selling only the compute, not models, who have even less reason to sell at a loss. Their prices are comparable to 1st-party providers.

3. Deepseek published their inference cost structure for R1. According to that data their paid API traffic is very lucrative (their GPU rental costs for inference are under 20% of their standard pricing, i.e. >80% operating margins; and the rental costs would cover power, cooling, depreciation of the capital investment).

Insofar as frontier labs are unprofitable, I think it's primarily due to them giving out vast amounts of free access.

[0] https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch...

[1] https://news.ycombinator.com/item?id=44165521

I think you miss 2 big aspects:

1. High volume providers get efficiencies that low volume do not. It comes from both more workload giving more optimization opportunities, and staffing to do better engineering to begin with. The result is break even for lower volume firms is profitable for higher volume, and as high volume is magnitudes more scale, this quickly pays for many people. By being the high-volume API, this game can be played. If they choose not to bother, it is likely because strategic views on opportunity cost, not inability.

That's not even the interesting analysis, which is what the real stock value is, or whatever corp structure scheme they're doing nowadays:

2. Growth for growths sake. Uber was exactly this kind of growth-at-all-costs play, going more into debt with every customer and fundraise. My understanding is they were able to tame costs and find side businesses (delivery, ...), with the threat becoming more about category shift of self-driving. By having the channel, they could be the one to monetize as that got figured out better.

Whether tokens or something else becomes what is charged for at the profit layers (with breakeven tokens as cost of business), or subsidization ends and competitive pricing dominates, being the user interface to chat and the API interface to devs gives them channel. Historically, it is a lot of hubris to believe channel is worthless, and especially in an era of fast cloning.

> High volume providers get efficiencies that low volume do not

But paid-per-token APIs at negative margins do not provide scaling efficiencies! It's just the provider giving away a scarce resource (compute) for nothing tangible in exchange. Whatever you're able to do with that extra scale, you would have been able to do even better if you hadn't served this traffic.

In contrast, the other things you can use the compute for have a real upside for some part of the genai improvement flywheel:

1. Compute spent on free users gives you training data, allowing the models to be improved faster.

2. Compute spent on training allows the models to be trained, distilled and fine-tuned faster. (Could be e.g. via longer training runs or by being able to run more experiments.)

3. Compute spent on paid inference with positive margins gives you more financial resources to invest.

Why would you intentionally spend your scarce compute on unprofitable inference loads rather than the other three options?

> 2. Growth for growths sake.

That's fair! It could in theory be a "sell $2 for $1" scenario from the frontier labs that are just trying to pump up their revenue numbers to fund-raise from dumb money who don't think to at least check on the unit economics. OpenAI's latest round certainly seemed to be coming from the dumbest money in the world, which would support that.

I have two rebuttals:

First, it doesn't explain Google, who a) aren't trying to raise money, b) aren't breaking out genai revenue in their financials, so pumping up those revenue numbers would not help at all. (We don't even know how much of that revenue is reported under Cloud vs. Services, though I'd note that the margins have been improving for both of those segments.)

Second, I feel that this hypothetical, even if plausible, is trumped by Deepseek publishing their inference cost structure. The margins they claim for the paid traffic are high by any standard, and they're usually one of the cheaper options at their quality level.

I think you ignored both of my points -

1. You just negated a technical statement with... I don't even know what. Engineering opportunities at volume and high skill allow changing the margin in ways low volume and low capitalization provider cannot. Talk to any GPU ML or DC eng and they will rattle off ways here. You can claim these opportunities aren't enough, but you don't seem to be willing to do so.

2. Again, even if tokens are unprofitable at scale (which I doubt), market position means owning a big chunk of the distribution channel for more profitable things. Classic loss leader. Being both the biggest UI + API is super valuable. Eg, now that code as a vertical makes sense, they bought more UI here, and now they can go from token pricing closer to value pricing and fancier schemes - imagine taking on GitHub/Azure/Vercel/... . As each UI and API point takes off, they can devour the smaller players who were building on top to take over the verticals.

Seperately, I do agree, yes, the API case risks becoming (and staying) a dumb pipe if they fail to act on it. But as much as telcos hate their situation, it's nice to be one.

There are more factors to cost than just the raw compute to provide inference. They can’t just fire everyone and continue to operate while paying just the compute cost. They also can’t stop training new models. The actual cost is much more than the compute for inference.
Yes, there are some additional operating costs, but they're really marginal compared to the cost of the compute. Your suggestion was personnel: Anthropic is reportedly on a run-rate of $3B with O(1k) employees, most of whom aren't directly doing ops. Likewise they also have to pay for non-compute infra, but it is a rounding error.

Training is a fixed cost, not a variable cost. My initial comment was on the unit economics, so fixed costs don't matter. But including the full training costs doesn't actually change the math that much as far as I can tell for any of the popular models. E.g. the alleged leaked OpenAI financials for 2024 projected $4B spent on inference, $3B on training. And the inference workloads are currently growing insanely fast, meaning the training gets amortized over a larger volume of inference (e.g. Google showed a graph of their inference volume at Google I/O -- 50x growth in a year, now at 480T tokens / month[0])

[0] https://blog.google/technology/ai/io-2025-keynote/

That's all the more reason to run at a positive margin though - why shovel money into taking a loss on inference when you need to spend money on R&D?
I heart you.

Classic fixed / variable cost fallacy: if you look at the steel and plastic in a $200k Ferrari, it’s worth about $10k. They have 95% gross margins! Outrageous!

(Nevermind the engine R&D cost, the pre-production molds that fail, the testing and marketing and product placement and…)

They probably have been running at negative margin, or at the very least started that way. But between hardware and software developments, their cost structures are undoubtedly improving over time —- otherwise we wouldn’t be seeing pricing drop with each new generation of models. In fact, I would bet that their margins are improving in spite of the price drops.
Citation needed.

Model providers spend a ton of money. It is unclear if they will ever have high margins. Today they are somewhere between zero and negative big numbers.

subscription model is just there to serve B2C side of business which in turn them into B2B side

antrophic said themselves that enterprise is where the money at, but you cant just serve enterprise on the get go right

this is where the B2C indirect influence comes

What evidence do you have that there's decent margin on the APIs?
Even to the extent that’s true, that doesn’t seem to be the issue here.

OpenAI is acquiring Windsurf which is its most direct competitor.

True. Otherwise Anthropic would cut access to other code assistants too as they all compete with Claude Code.
They might still. Why not?

Illustrates a risk of building a product with these AI coding tools. If your developers don't know how to build applications without using AI, then you're at the mercy of the AI companies. You might come to work one day and find that accidentally or deliberately or as the result of a merger or acquisition that the tools you use are suddenly gone.

> If your developers don't know how to build applications without using AI, then you're at the mercy of the AI companies.

The same can be said if your developers don't know how to build applications:

- without using syntax highlighting ...

- without using autocomplete ...

- without using refactoring tools ...

- without using a debugger ...

Why do we not care about those? Because these are commodity features. LLMs are also a commodity now. Any company with a few GPUs and bandwidth can deploy the free DeepSeek or QwQ models and start competing with Anthropic/OpenAI. It may or may not be as good as Claude 4, but it won't be a catastrophe either.

Those examples are all either zero cost or "buy once, use forever." How is that an argument against outsourcing your core competency to third party in perpetuity?
It's an argument against the original argument:

> you're at the mercy of the AI companies

You are not at the mercy of anyone. There are capable, open models that are self-hostable. For example, JetBrains IDEs come with a free local model [1] that runs on your CPU, which is exactly "buy once, use forever". If you want a bit more oomph a consumer level Nvidia GPU or Apple Silicon is sufficient.

> outsourcing your core competency to third party

I don't think my core competency is remembering obscure syntax, or being able to perform repetitive tasks. But if that's true, then I'm screwed anyway. My employer can simply pay an AI company 100 bucks a month and fire me. My own resistance against using LLMs won't change anything. The only logical thing to do would be to accept that my core competency is no longer valuable, and find another core competency.

[1] https://www.jetbrains.com/junie/

This is true of any SaaS vendor
I 100% agree with you except your framing makes it sound like the model providers are doing something wrong.

If I spend a ton of money money making the most amazing ceramic dinner plates ever and sell them to distributors for $10 each, and one distributor strikes gold in a market selling them at $100/plate, despite adding no value beyond distribution… hell yeah I’m cutting them off and selling direct.

I don’t really understand how it’s possible to see that in moral terms, let alone with the low-value partner somehow a victim.

I don’t think it is at all clear that Windsurf adds zero value. Why do you think this is a helpful analogy?
The analogy is a bit like this. Imagine that there are 100 ceramic dinner plates for $6 each. Now someone comes in and buys them from you for $5 each - undercutting your margin. Then a 3rd company comes in and literally eats your lunch on your own ceramic dinner plates. The moral of the story is any story involving ceramic dinner plates is a good one, regardless of the utility of any analogy.
Or AWS, and AWS managed services v.s. other managed services on top of AWS.