Hacker News new | ask | show | jobs
by tonymelony 9 days ago
This rumor is not demonstrably true. The subscription prices are competitive and for heavy users even cheap compared to API rates, but there is no evidence that they are structurally priced below cost.
6 comments

A good way to think about it is finding how much it'd cost to buy and run a GPU that runs a model at around 100tk/s ("thinking" agents are not viable otherwise).

The figure mentioned in the video is not far off

That's not evidence, that's a theory of evidence existing somewhere.
Wait for the price fixing that will eventually come after the horse race. Just like internet, phone, tv, etc. The prices are universally increased in tandem.
Are you including capex when you say "cost"? Or are you just looking at inference costs?
It doesn't make sense to include the capex cost to train a model in this kind of discussion, because that cost is fixed.

Consider a model that costs $100m to train.

If the vendor then prices it such that each inference token has a margin of 10% over the variable costs to serve (power + server costs), whether or not they cover their costs is based entirely on how many tokens they can sell.

If they sell less than $1bn of tokens, they lose money - the break even point is 10x100m = $1bn.

If they sell $10bn of tokens they make a ton of money.

This also means you can't credibly calculate how much of the fixed training expense is covered by your token spend, because until the model is retired and you can account for how much inference it ran you don't know what percentage of the training cost each sold token was responsible for.

Cost is fixed if you train a model once in several years, if you have to train 3/4 times per year to stay competitive training cost is a thing.

You have to include also failed training sessions and experiments in the math.

There are no official figures but given how fast new models are rolled out, I wouldn't be surprised if neither Anthropic nor OAI manage to cover the full models cost.

I think the capex being fixed assumes you can just stop training the next model. But its not clear that you can afford to do that and keep selling tokens.

And if capabilities plateau such that training the next one is useless, then the margins will drop fast due to competition.

Model inference:training compute for frontier models is estimated to be over 10:1 now.

Driven mostly by just how much inference they sell nowadays - but also by things like base model reuse.

> This rumor is not demonstrably true.

OpenAI, Anthropic, and Microsoft/Meta/Google are all at a net negative on AI (i.e. they're "demonstrably" losing money). So it is objectively true. If everyone is losing money, and nobody is profitable, then it is a demonstrable fact.

As far as I know, the only "AI" venture currently in the green is Nvidia, and they're selling shovels to gold miners.

They are losing money because they are training new models and building new data centers. The claim of the video is that they're losing money just serving current AI models. There's just no evidence of that.
> They are losing money because they are training new models and building new data centers.

Neither of which ever goes away. These aren't short term costs, they're the costs of running their business, and it isn't profitable.

> The claim of the video is that they're losing money just serving current AI models.

Which is true. Every one is losing money, none are profitable. They're losing money serving current AI models.

> There's just no evidence of that.

Their own profit/loss statements are "evidence of that." According to these companies themselves, they're at a net loss every quarter. So it isn't clear what more "evidence" people need or expect.

It is demonstrably true.

Grab gpt-oss-120b, run it continuously and see how far 20 dollars worth of that gets you. People definitely use much more than that in a month, not just power users but regular ones, and they're using models that are more expensive to run (plus the "cloud" markup).

i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan.

heres some napkin math

gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.

1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)

2. you are getting 75% prompt cache reads

Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)

Input: ~633mil

~475 mil cached at 50% input pricing = ~$9.25

~158 mil uncached = ~$6.15

tokensOutput: 25mil tokens ($4.5)

This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.

its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.

I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

heres the creator of opencode explaining how you are wrong

https://youtu.be/1VqKUrxR2C8?si=uOAs_4XNXtTyTwCP&t=2195

He's either incompetent or lying.

An H100 today costs $2.95 an hour on vast.ai[1], which is already a good deal.

gpt-oss-120b on an H100 gives you ~200-250 tokens per second. I will be generous and say you can get a million tokens an hour out of it.

OpenCode Go (which I gladly pay for, because of this in part) is $10 a month, that's three hours of H100 use, and the models you have there are more expensive than gpt-oss-120b. Sure, they have "scale" (although that doesn't apply to AI inference, but whatever) and this and that, they're still pricing it 20-30x below their minimum threshold of capital expense.

Apples to apples, GLM 5.1 they sell it to you at $4.40 per million tokens, at ~50 tps in an H100 (being generous) it costs ~$16 to do a million tokens.

The math is simple and clear, they lose money.

1: https://vast.ai/pricing

its kind of hilarious and so hackernews coded to think you know more about the product you use than the guy who actually built it.
Okay then provide a link to a Dropbox PDF or official documentation “demonstrating” the premise is “untrue” please. Or admit you’re blinded by faith. Or financially interested in the public believing in a hypothetical like your second sentence.

In short, citation needed or shens bruh.

If you're claiming "AI inference is sold at a loss", it's on you to prove it.

All we have actual evidence of is: some users use enough AI that the subscription is sold at a loss to them (up to degenerate cases: usage maxed out at all times), if billed by API metrics, while some other users are, by the same metrics, profitable (down to degenerate cases: a forgotten subscription with $20 a month and 0 usage).

We don't know how API prices relate to costs - we only have estimates. And we certainly don't know how much inference does an average subscription user spend.

If you have some sort of information that would decisively prove that the aggregate is "AI company N is losing money on subscriptions", then, show it.

Or is it you who's blinded by faith? Like some sort of AI bubble cultist? The bubble is real, you just have to believe in it?

Very well said. People are making a lot of claims when very little knowledge of the financials is public. If you actually look at the numbers, there are plenty of ways in which API revenue and forgotten subs could more than make the difference for power users. Even if power users are getting 10-20x their sub fee in tokens, the math could still work out. Personally, I doubt more than 5% of Claude subs even approach max usage, because it requires having so many agents running all of the time.

I imagine we'll know in a few months when these companies go public.