Hacker News new | ask | show | jobs
by elbasti 104 days ago
"Any conversation about token costs devolves into an ad-hoc, informally-specified, bug-ridden implementation of half of generally accepted accounting principles."

We have a way of determining if Anthropic is, or has the capability of being profitable, and what the levers to that may be. AI may be world-changing, but the accounting principles behind AI labs are no different than those behind a Pizza Hut.

Even if the cost of "inference + serving" is lower than the cost of selling a token, the relevant question is what is the depreciation schedule of the cost of training. ie, if I spend $1 on training, how long do I have before I have to spend $1 again?

Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable. So the question is:

What can be done to make training depreciate more slowly? Perhaps users can be persuaded to stick around using non-fronteir models for longer, although then there's a shift in the competitive landscape.

If users cannot be persuaded (forced?) to use legacy models, then the entire business model is thrown into question, because there's no reason why training frontier models would ever get cheaper: even if it gets cheaper on the margin, surely that will result in more compute used to generate an even "better" model, resulting in more spend in the aggregate.

This doesn't mean that the AI industry is "doomed". A couple things could happen, and this is where the fronteir labs should be focusing their attention:

1. They could find a way to climb up the value chain and capture more of the consumer surplus.

2. There could be a paradigm shift in compute architecture/compute cost.

3. We could reach a limit of marginal utility, shifting consumption to legacy models, thereby lengthening the depreciation/utility of training.

Edit: My assertion of "Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable." is made with no real information, just a gut feeling, and should not be taken seriously.

6 comments

Dario has made a specific cohort argument here. His numbers (from various interviews) are: you train a model in 2023 for $100M, deploy it, and it earns $200M over its lifetime. Meanwhile you train the 2024 model for $1B, which goes on to earn $2B. Each vintage returns 2x on its training cost.

However, the GAAP P&L tells the opposite story. You book $200M revenue in the same year you spend $1B training the next model, so you report an $800M loss. Next year you book $2B against $10B in training spend, reporting an $8B loss. The business looks like it's dying when every individual model generation actually generates a healthy profit.

That's actually Dario's answer to your depreciation question. If each cohort earns back its training cost within its natural lifespan (however short that lifespan is), the depreciation schedule is already baked in. The model doesn't need to live forever, it just needs to return more than it cost before the next one replaces it. Whether that's actually happening at Anthropic is a different question, and one we can't answer without audited financials, but it's the claim Dario makes (and seems entirely reasonable from a distance).

GAAP doesn't work here really. the R&D treadmill means you are always betting on next year and its NOT inventory or something you can defer your cost on. It's an upfront R&D expense.

so what happens on year 10 when Anthropic hits a $10B training and only returns $8T? they're cooked

Yeah, that's kind of what I'm wondering about.

It's an interesting story about how even though all metrics show massive losses actually they have massive gains.

Accounting is a rather mature field, so I figure that someone in the past has tried this stunt and there should probably be ways for dealing with it.

Or do they always flame out after losing all the money? Knowing the history here would be informative.

If those numbers are correct, then my assertion that "Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable." is incorrect.

And I admit that I made that assertion from my gut without actually knowing if it's true or not.

If you have to continually spend greater amounts of money to keep up with the competition on every new model then it is dying.

Every single time a company comes around and goes "Actually GAAP are wrong, look at my new math that says were good" its led to much wailing and gnashing of teeth in the future when it inevitably isnt.

That's an interesting idea. I'm curious, though, are there any other industries and/or companies that have tried to pull this sort of thing off? And what ultimately happened to them?
Enron had a system like this. They regularly worked on large, long term contracts that became profitable over years/decades. They wanted to push rewards forward so would estimate the total value of the contract and book the profit when it closed. Mark-to-market accounting wasn't unheard of the time but using it for assets without an active market was unique. Without the market to make against, the numbers were best guess projections.

The problem is everyone along the line is incentivized to be aggressive with estimate (commissions for sales are bigger, public financials looks better) and discouraged from correcting the estimates when they go wrong.

Estimating multi-year returns on frontier models looks harder than estimating returns on oil and gas projects in the 90s.

The bar for "wildly unprofitable" has risen quite a bit since then, but Amazon basically pioneered this.
Why would anyone use 200M model when 1B model is available? The company increase its bet with each iteration increasing risks. It blow up at some point because it cannot guarantee 2B return after 1B investment.

To GAAP point - 200M or 1B or 10B is not a loss but cash converted into an asset. It won’t affect the bottom line at all. Unless the company re-evaluates the asset and say it now cost 1M instead of 200M. This would hit the bottom line.

If you can remember where you read it, could you share a link?
https://youtu.be/GcqQ1ebBqkc?t=1027 is on such but he doesn't actually say that each model has been profitable.

He says "You paid $100 million and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume in this cartoonish cartoon example that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model is actually, in this example is actually profitable. What's going on is that at the same time"

importantly you'll notice that he's talking revenue, and assumes that inference is cheap enough/profitable enough that 100M + Inferance_Over_Lifetime < 200M

> They could find a way to climb up the value chain and capture more of the consumer surplus.

Yes, this is exactly why OpenAI and Anthropic are hyping AGI. If LLMs ever become good enough to replace workers, the first sign will be frontier model companies launching competitor businesses. It doesn't make sense to sell the formula for gold when you can just use it yourself.

> There could be a paradigm shift in compute architecture/compute cost.

Possible, but no signs of this on the horizon. If it does happen, it's impossible to predict when it will.

> We could reach a limit of marginal utility, shifting consumption to legacy models, thereby lengthening the depreciation/utility of training.

I'm not sure market dynamics will allow this any time soon. We seem to have already achieved a marginal utility equilibrium in terms of model size, so training new models on trending use-cases (e.g. synthetic data targeting tool calls, agentic workflows, computer use, etc) is really the driving force behind product differentiation. Nobody wants to admit "training new models isn't profitable" because that deflates the AGI singularity narrative that all this investment hinges on.

I'm not accountant, but I would expect Pizza Hut's accounting is significantly more complex than Anthopic's. 50+ year old global franchise with physical supply chain partnerships vs an upstart SAAS company?
Your instincts are good here. Whatever complexity Pizza Hut has it comes from being the weakest of the Yum! Brands siblings — KFC carries the international profit, Taco Bell owns domestic. Pizza Hut is slow growth, perpetual restructuring, and a weird inherited obligation to always serve Pepsi.
> Almost certainly, any reasonable depreciation schedule of the cost of training [...]

Maybe not? This is an argument that has to be made using numbers. We can't do the estimate without the numbers.

This is correct. I regret that assertion and have added a comment reflecting that.
The world labor market is ~35T USD yearly, and so that is roughly the order of magnitude to balance against frontier model training cost. E.g. Dario Amodei has his "data center of PhDs" level where he assumes that's "good enough" to stop training frontier models; so if that can take even 5% of global labor market that's ~1.5T a year revenue, balanced against current model training costs of ~1B. 3 orders of magnitude might get us to PhD level? I think that is ultimately the bet the big AI companies are making. Even if 1T is the cost of PhD level AI then three/four companies could depreciate that over 4-5 years sharing that 5% of global market.
> The world labor market is ~35T USD yearly, and so that is roughly the order of magnitude to balance against frontier model training cost.

Crazy that people can write sentences like this with a straight face these days.

Of course a model does not really depreciate, the problem is they are forced by competitive pressure to offer newer/better models at the same price.

This is what the elites of the gilded age called "ruinous competition", and the solution today will be the same as back then: monopoly power. This has been the business plan of the tech VC industry for 25+ years.

Do they not depreciate?

The models don't learn without training, and they have finite context windows. As software updates around the world, don't they have to be trained on the new information to stay up to date?

Fair, but in this context people are generally contemplating the need to replace the model with a new, much larger and more expensive model, not just refresh the training set.

It's partly about updating what it "knows", but more about keeping up with competitive pressure on capabilities.

I’m actually not familiar enough to know. Can models be refreshed for cheaper? I thought due to the black box nature of them that there would be no difference between updating and generating a whole new model.

Maybe they can get to a “good enough” level where the next model isn’t 10x the price but if the business model requires ever increasing sizes to paper over the r&d costs from the previous set then I don’t understand how they would transition to profitability

People? There's a guy upthread quoting the Anthropic CEO on how they view the value of increasing training against the offset of the entirety of the $35T worldwide labor market... It's not "people". It's the salesmen.