Hacker News new | ask | show | jobs
by FinnLobsien 2 hours ago
The problem space has a few aspects:

1. We're still in the "$5 airport Uber" era of LLMs. They're heavily subsidized, and everyone still complains about costs.

2. There hasn't been a real incentive to work on cost optimization for data centers and the hardware they contain. When/if price hikes happen and send people scrambling to use other models or drastically reduce AI usage, this will suddenly need to happen.

3. We're massively overusing SOTA models. As long as you're on a subsidized subscription, you can use Claude Opus 4.8 high to write blog article meta descriptions. If you paid by token, you wouldn't do that.

4. Open models are a wildcard that could completely change the calculus.

6 comments

>3. We're massively overusing SOTA models. As long as you're on a subsidized subscription, you can use Claude Opus 4.8 high to write blog article meta descriptions. If you paid by token, you wouldn't do that.

This idea that the subscriptions are subsidized is repeated over and over, but I've never seen any proof of this. It seems to be entirely based on the inferred API cost the subscription usage could give you, but there are a lot of assumptions needed for that to follow.

> This idea that the subscriptions are subsidized is repeated over and over, but I've never seen any proof of this. It seems to be entirely based on the inferred API cost the subscription usage could give you, but there are a lot of assumptions needed for that to follow.

My claude code environment shows me cost per token used in that session, according to API costs. It regularly exceeds $200. I pay $200 a month for my claude subscription. That's fairly obviously subsidised, unless you genuinely believe their unit costs are 100x less than what they're charging.

The API inference cost to customers is not the actual cost of providing inference, and the cost of providing API inference need not be the cost of providing subscriber inference.
But they also determine the token prices. What you describe could also be true if they take a 5x profit margin on api tokens and 2x margin on subscriptions.
That's what they want to charge you. Not the actual cost. The actual cost is a gpu that's probably already paid off and about $2 of electricity
Most prices, like GPUs, are amortized over several years, when doing the calculus. Maybe they're already paid off, maybe they aren't. I would lean toward "aren't".
They are subsidized by the huge losses incurred by the AI companies.
Data from OpenAI shows their 2025 inference revenue exceeded their cost of inference by a good margin (https://cdn.arstechnica.net/wp-content/uploads/2026/06/opena...). Saying this is being subsidized is like saying any investment in future productive assets is "subsidized".
Anthropic have claimed they expect their first profitable quarter this year. As far as we can infer their current API prices have decent margins.
Anyone can claim they are profitable, simply by reclassifying their expenses as some other thing or shuffling them to separate corporate structure. Until we will real financial audit, the CEOs claims are just a hot air.
OpenAI's leaked documents also said OpenAI was profitable on inference. The small resellers of open models have nowhere near the resources to optimise their models or inference and yet usually have a lower cost, why wouldn't the big labs?
Only if those losses are coming from subscriptions, instead of capex and training, which is not at all clear.
this argument assumes that capex and training costs will go down over time. but theyll have to keep up with one another and stay on top of latest knowledge so Im not sure if thats true
I don't understand this argument. How does it make the subscription any less subsidised if the losses are only because developing the product is just so darn expensive?

Feels like arguing that it's not clear if Bugatti's losses came from selling the Veyron instead of designing and developing the Veyron.

The equivalent is when Amazon was running a loss because they were spending all their money on building warehouses. It exactly make sense, but that's the argument.
"Our 2015 car models are totally profitable if we will just stop making new cars and continue producing only 2015 models for the next decade."
From the article:

> What is happening here is that leading AI labs are charging not only for inference but also for research in model architecture, training data collection and curation, model training cost (which can be tens or even hundreds of millions of dollars), paying their employees and recovering the marketing costs.

That's what's being subsidized.

You are saying it as if those costs were not necessary to provide the service.
They are not. They are necessary for the development of future models, which does not influence the availability of the current ones. Plus you have chinese models distilling current SOTA for pennies on the dollar, so as a consumer I never will be worse off in the long (1-2 years) run.
OpenAI inference revenue exceeds its cost of inference by a good margin in 2025 (https://cdn.arstechnica.net/wp-content/uploads/2026/06/opena...)
Great, but that's only a part of operational costs. A craftsman's revenue may exceed the electricity bill for the power drill, doesn't mean the business is sustainable.
Is this supposed to be some sort of gotcha? Apart from research and marketing, that's operational costs. I mean, every product could be cheaper, if you didn't have to pay for employees and means of production.
What assumptions are needed for inferring cost based on api pricing?
The API inference cost to customers is not the actual cost of providing inference, and the cost of providing API inference need not be the cost of providing subscriber inference.
That API pricing isn't massively inflated.
> 1. We're still in the "$5 airport Uber" era of LLMs. They're heavily subsidized, and everyone still complains about costs.

How does that figure look if you count in the current unprecedented LLM/AI-driven price inflation on both hardware, services and software? I don't believe we're exactly in the "$5 airport uber" era if you count that into your total.

To draw a parallel - airport Ubers are still $5, but you can't buy a 2nd hand prius any more!
Following your parallel: Except the fact that you still need a car in your life, even if you take an uber to the airport when needed :)

And in this analogy you need to spend a lot more when buying a car, no matter if it's a new or 2nd hand one, following the price inflation caused by cheap Ubers. So in essence, my question is how much have those cheap Uber rides then cost you in reality, when factoring in the directly related price increases for the things you need and buy? Is it a net positive or negative at the end of the day for anyone other than the very few at the very top of the system?

It's just making a parallel. We may be at the 10cent Uber. But oil and labour costs tend to go up, tokens as used today will probably cost what they cost today or less. But we won't just go to the airport, if we can go to Mars we will ask for it.
It about what you pay, not about what it costs.
Mostly agreed, however I'm not sure about 3: I suspect it works like gym memberships, and the companies mostly make their money from people who don't use the subscriptions all that much.
I think the problem is that the companies mostly don't make money, period. They may have better unit economics on underused subscriptions, but I don't see a world in which OAI/Anthropic don't heavily tighten the screws in the future.

Right now it's silly to default to frontier models, but it won't bankrupt your company. I believe in the short-medium term future, we'll need to be more deliberate about model choices.

In the long-term, of course, tech costs tend to plummet. Is there a future where in 15 years, my Apple Watch locally runs an Opus 4.8-class model? Maybe. And that would obviate this whole discussion.

I'm just here so I can look at my post history and have a hearty laugh about in a few years.
I'd say that is/was their long game, but it's still very much in hype phase so there's a lot of people intensively using these models, and I don't think it's anywhere near cost efficient right now. Maybe in the long run when people get bored with it, but on the other hand people are becoming dependent on it for everyday things.

We've already seen price hikes / token limits earlier this year, with suddenly some people running out of budget on the first day of the month. This will likely keep going for a while.

On the other hand, costs will drop too - open models and specialized hardware, as the article notes. The long question will be whether the companies will get a return on their invested billions. I don't think they will, not with the amount of competition they're facing, and I don't think any one company or model (series) has a monopoly yet. Popularity sure, but I'm confident a competitor may appear tomorrow and people will switch.

I follow a guy called Daniel McCarthy on LinkedIn who writes a lot on CLV and that seems to be his take. Even if theoretically you get way more than you pay with subscriptions, the vast majority of people are not power users.

https://danielminhmccarthy.com/

The vast majority of active users of ChatGPT could successfully use a model like Gemma 4 12B with agentic search if x86 hardware didn't make that so difficult.

Likely even the E4B, which is really both fun and impressive.

That is clearly a big component of Apple's bet, anyway.

I have experimented with it and E4b is perfectly capable of being useful if you provide it with ready–to–use skills.

It's still more like programming than telling a chatbot to go make you GTAVI in JavaScript and make sure the graphics are as good as the original.

Maybe a safer prediction would be that most people will be fine just using hybrid agentic programs that run the models locally(probably with extra spyware). I think this is Apple's bet.

> I suspect it works like gym memberships, and the companies mostly make their money from people who don't use the subscriptions all that much.

I think it's like that, but not quite. The people who have a subscription but barely use it were probably never doing any serious work with AI in the first place. I.e., why would they get a subscription when their one or two chat questions (or, "make a picture of me as a superhero" prompts) per day can be had for free?

Especially with Claude, I think people who subscribe skew very heavily towards people that can very easily make more than $20 worth of queries in a month. And then there's the not-insignificant number of people who are tokenmaxxing.

It's like the gym membership model except ten percent of members are able to spend 72 hours per day at the gym while the rest spend 8 IMO.

Based on the people I know, they're paying because when ask they want the smartest model to be the one answering. There's still quite a difference between models.
Technically yes but it's not hard to get to $20 plan caps. Till current hardware prices cool down I don't see it being easy to make money on frontier models.
> We're still in the "$5 airport Uber" era of LLMs. They're heavily subsidized, and everyone still complains about costs.

Inference is not exactly cheap. Based on what do you think this is "heavily subsidized" still? What would to token cost have to be, with current models, for it to not be that? What do you know that has you make such a claim?

> 1. We're still in the "$5 airport Uber" era of LLMs. They're heavily subsidized, and everyone still complains about costs.

Do they? It's free right now at chat.com. After that it's $20/month which isn't much in the US. Three Starbucks or two meals at McDonald's will run you more than that these days.

> 2. There hasn't been a real incentive to work on cost optimization for data centers and the hardware they contain. When/if price hikes happen and send people scrambling to use other models or drastically reduce AI usage, this will suddenly need to happen.

It's actually worse, the AI explosion just hikes hardware prices faster than capacity can catch up - and it will likely not catch up in a while because investments are both expensive, long, and might not seem all that good idea while bubble is still bubbling.

The massive push frankly also made it unsustainable. If RAM didn't cost 3x and compute manufacturers would have to compete instead of selling every unit instantly at whatever price they want the frontier model tokens might've costed closer to sustainable amount