Hacker News new | ask | show | jobs
by mike_hearn 98 days ago
I'd love to be a fly on the wall when this argument is tried in front of a bankruptcy court. It drives me nuts. Of course there's evidence that they're selling tokens at a loss.

The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.

That means the cost of "inference" is all their costs combined. You can't just arbitrarily slice out anything inconvenient and say that's not a part of the cost of generating tokens. The research and training needed to create the models, the salaries of the people who do that, the salaries of the people who build all the serving infrastructure, the loss leader hardcore users - all of it is a part of the cost of generating each token served.

Some people look at the very different prices for serving open weights models and say, see, inference in general is cheap. But those costs are distorted by companies trying to buy mindshare by giving models away for free, and of those, both the top labs keep claiming the Chinese are distilling them like crazy including using many tactics to evade blocks! So apparently the cost of a model like DeepSeek is still partly being subsidized by OpenAI and Anthropic against their will. The cost of those tokens is higher than what's being charged, it's just being shifted onto someone else's books. Nice whilst it lasts, but this situation has been seen many times in the past and eventually people get tired of having costs externalized onto them.

For as long as firms are losing money whilst only selling tokens, that means those tokens are selling at a loss. To not sell tokens at a loss the companies would have to be profitable.

9 comments

The article is about compute cost though. By "lose money on inference" I mean the assertion that inference has negative gross margins which a lot of people truly believe. This is important because it's common to reason from this that LLM's are uneconomical and a ticking time bomb where prices will have to be jacked up several orders of magnitude just to cover the compute used for the tokens.
But there's no such thing as compute cost in the abstract. What exactly is compute cost for AI? Does it include:

• Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.

• Gradient descent itself?

• The CPUs and disks storing and managing the datasets?

• The web servers?

• The people paid to swap out failed components at the dc?

Let's say you try and define it to mean the same as unit economics - what does it cost you to add an additional customer vs what they bring in. There's still no way to do this calculation. It's like trying to compute the unit economics of a software company. Sure, if you ignore all the R&D costs of building the software in the first place and all the R&D costs of staying competitive with new versions, then the unit economics look amazing, but there's still plenty of loss-making software startups in the world.

Unit economics are a useful heuristic for businesses where there aren't any meaningful base costs required to stay in the game because they let you think about setup costs separately. Manufacturing toys, private education, farming... lots of businesses where your costs are totally dominated by unit economics. AI isn't like that.

Gross margins and cost of revenue are well defined accounting terms that apply to any type of business.

> Does it include:

> Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.

No because this is training and not inference. Just like how R&D costs for a drug aren't part of COGS either.

> Gradient descent itself?

No

> The CPUs and disks storing and managing the datasets?

Yes

> The web servers?

Yes

> The people paid to swap out failed components at the dc?

Yes to the extent they are swapping for inference and not training. If the same employees do both then the accountants will estimate what percent of their time is dedicated to each and adjust their cost accordingly.

We weren't talking about COGS, we were talking about "cost of compute", which isn't an accounting term.

For the rest, anyone can define and apply an accounting metric but that doesn't mean it tells you anything useful. If you look at the unit cost of any typical IP business it's nearly zero. Yet, many companies lose money on making movies, video games, apps and books.

I'm not familiar with accounting, but I suspect a lot of these cloud infrastructure companies don't throw out hardware for a very long time, just like how AWS sells you their old stuff as whitelabel compute at a markup, behind which I think are mostly old pieces of hardware, I think as long as Anthropic keeps finding uses for the old GPUS provided they dont break, they don't have to write off these assets, which means they don't incur costs using them if they are clever with their books
The marginal cost of the next token. That can include the power, the operating cost of the facility, repair costs, etc.

The API price should hopefully incorporate the capitalized cost of the hardware, the facility rent, the cost to train the model, the r&d, cost of sales, etc., to make it profitable.

Claude Code Max may be able to offer a good price by having a mix of higher and lower utilization of users and ignoring the fixed costs, treating it as a driver of API sales. But it doesn't make sense to essentially pay people to use it.

Your point is that there are more relevant quantities to calculate for checking economic viability is fair, but that doesn't negate the "cost of inference" being an interesting metric in itself.
This comment defies common usage and accounting practices.

When people say “selling at a loss” they mean negative unit economics. No one ever means this much more expansive definition you’ve invented.

What you are talking about isn't inference cost. Yes, fundamentally what matters is all the work that goes into the models, including R&D, training, and inference.

But we talk about inference separately for a reason: largely inference cost is the scaling cost. Once you have a model the margin on your inference is how you get to profitability, as long as your margin is positive you can make the entire enterprise profitable by just selling more tokens. This is the same fundamental business that chip fabs work on. Yes it costs them a lot to get to the next node, but what's important is the margin they can get on the wafers they sell, because they sell tonnes of wafers.

It's pretty core to the concept of SAAS businesses that yes, you do consider all costs. But you want to focus on the margin of the bit that scales. This is why WeWork exploded, the thing they were scaling only scaled up at negative margin.

The point is that if their inference margin is positive, they can "just" scale up and become profitable. If their inference margin is negative, then scaling up the business actually causes problems.

Actually you can slice out a lot of things. It's even a GAAP metric, i.e. one of the common baseline that public companies are required to report, known as gross margin, literally just (revenue - cogs) / revenue. It is distinct from net margin, but both are useful and low gross vs net margin say very different things concerning the long-term prospects of the business.
This is all true but it isn't really important for the argument people are making. What is more important is the marginal cost per token. If each token sold is at a marginal loss, their losses would scale with usage, that simply can't be happening with API pricing. But in general, yes I agree with you and I'm sure they are taking a huge loss on Claude Code.
It looks to me like their losses have scaled with usage, though? They keep predicting their losses will increase even as usage has gone stratospheric.
They are certainly making huge bets that are risky, and so yes on their P&L the L are scaling. That doesn't say anything at all about their marginal inference cost.
You're missing costs.

- Amortized training costs.

- SG&A.

- Capex depreciation.

All the above impact profitability over various time horizons and have to rolled into present and projected P&L and cash flow analysis.

We have amortized training cost estimates. Inference to training compute over model lifetime is 10:1 or over for major models at major providers.

In part due to base model reuse and all the tricks like distillation. But mainly, due to how much inference the big providers happen to sell.

So, not the massive economic loss you'd need to push models away from being profitable. Capex and R&D take the cake there.

I don't think you are an accountant.
One very minor note; Anthropic and others, like most "enterprise" solution, also sell SSO + SCIM + audit logs. Their business plans have lower tokens and higher prices to cover the enterprise features, which should be essentially free to provide in 2026.
It depends how we are looking at the business. Absolutely at the end of the day a company is profitable or not but when thinking about inference, which is largely a commodity these days, you would first think about the marginal cost of it. That is your corner stone of the business. We have pretty clear indication that largely API tokens are being sold above the marginal cost. For especially a brand new business that’s critical and something that many unicorns never even hit.

Your right that all other costs are critical to measuring the profitability of the business but for such a young industry that’s the unknown. Does training get cheaper do we hit a theoretical limit on training. Are there further optimizations to be had.

You don’t have large capex in an industrial and then in year one argue that the business is doomed when your selling the product above the marginal cost but you have not recouped costs yet that have been capitalized.