| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JimDabell 381 days ago

> you also don't have any evidence that they are profitable.

Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

Yes, capex not opex. The cost of running inference is opex.

8 comments

antman 381 days ago

No we don't, MS used their OpenAI position as a strategy to increase Azure adoption. I am surprised AWS didn't give ls for free

wkat4242 380 days ago

Microsoft's plans for openai are much bigger than just Azure. They have 30 products/services with copilot in the name already. And they're pushing them like crazy to end users.

bee_rider 381 days ago

> Yes, capex not opex. The cost of running inference is opex.

This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.

Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?

So, is it really obvious what’s capex and what’s opex? In particular:

* The nodes used for training are rented, so that’s opex, right?

* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.

JimDabell 381 days ago

> The nodes used for training are rented, so that’s opex, right?

It’s capex. They are putting money in, and getting an asset out (the weights).

> The models are in some sense consumable?

Assets depreciate.

bee_rider 381 days ago

Obsolete software don’t depreciate like obsolete hardware. If an LLM company has trained a truly better model, they can simply make as many copies of their own model as they want. Thus, if the new model is truly better in every way, the old one is completely valueless to them (of course there might be some tradeoffs which mean older models can stick around because they are, say, smaller… but, ultimately they will be valueless after some time).

Because models are still being obsoleted every couple years, old models aren’t an asset. They are an R&D byproduct.

qeternity 381 days ago

> the old one is completely valueless to them

This is of course untrue for the same reason that people are still running Windows 2000.

bee_rider 381 days ago

> This is of course untrue for the same reason that people are still running Windows 2000.

What is the reason?

dcre 381 days ago

They’ve built processes around it and don’t feel like/can’t afford to/ don’t know to how change them.

nine_k 381 days ago

> when LLMs become really useful

It looks to me similar to the situation with that newly fashionable WWW thing in, say, 1998. Everybody tried to use it, in search of some magic advantage.

Take a look at the WWW heavyweights today: say, Amazon, Google, Facebook, TikTok, WeChat. Are the web technologies essential for their success? Very much so. But TCP/IP + HTML + CSS + JS are mere tools that enable their real technical and business advantages: logistics and cloud computing, ad targeting, the social graph, content curation for virality, strong vertical integration with financial and social systems, and other such non-trivial things.

So let's wait until a killer idea emerges for which LLMs are a key enabler, but not the centerpiece. Making an LLM the centerpiece is the same thinking that was trying to make catchy domain names the centerpiece, leading to the dot com crash.

rco8786 381 days ago

AWS isn’t doing the training on those models.

JimDabell 381 days ago

OpenAI spends less on training than inference, so the worst case scenario is less than double the cost after factoring in training. Inference is still cheap.

rco8786 381 days ago

Inference is cheap. Training is cheaper. Then where's all the money going? OpenAI is reporting heavy losses, but you're saying the unit economics of inference are all good. What are they spending money on?

jsnell 381 days ago

Their spending is not a problem. It's quite low for a top-tier hard tech company that's also running a consumer service with 500M active users. They are making a loss because 95% of their users are on free accounts, and for now they're choosing not to monetize those users in any way (e.g. ads).

raydev 381 days ago

sama tweeted that the $200 tier was priced too low to cover costs a few months ago.

FergusArgyll 381 days ago

At that price level you run into serious adverse selection

aswegs8 381 days ago

How credible are his PR statements, though?

mediaman 381 days ago

Salary, mostly. It's useful to separate out the GPU cost of training from the salary cost of the people who design the training systems. They are expensive.

That does not mean, however, that inference is unprofitable. The unit economics of inference can be profitable even while the personnel costs of training next-generation models are extraordinary.

JimDabell 380 days ago

> Then where's all the money going?

They are giving vast amounts of inference away as part of their free tier to gain market share. I said inference is cheap, not that it is free. Giving away a large amount of a cheap product costs money.

> you're saying the unit economics of inference are all good

Free tiers do not contradict positive unit economics.

ahtihn 381 days ago

Salaries?

derangedHorse 381 days ago

This is not generally true. Inference costs have just began to spike starting with the 'test-time scaling' trend[1]. I imagine most OpenAI users are free and the mini models available to them only cost a few cents per task[2]. The chart from The Information featured in this Reddit thread seems more reasonable[3].

Although that was posted in October, so not much time for the reasoning model costs to show up. It's also important to note their revenue is on track to more than double this year[4] and one can't make a complete picture without understanding the revenue spent on the inference provided by these reasoning models.

[1] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[2] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[3] https://www.reddit.com/r/singularity/comments/1g0acku/someho...

[4] https://techcrunch.com/2025/06/09/openai-claims-to-have-hit-...

Gerardo1 381 days ago

> You think AWS are going to subsidies your usage of somebody else’s models

Yes

>indefinitely?

No, and that's the point.

Spivak 381 days ago

Have they ever actually done this? I can't think of a time they've actually raised their prices ever that isn't the Route53 passing on registrar costs.

sokoloff 381 days ago

They started charging for public IP addresses in early 2024, which was a price increase from zero.

Gerardo1 380 days ago

That assumes that they've been similarly situated with an offering that isn't profitable and has no path to profitability.

What company selling a primarily AI-based service right now is making a profit on that service?

maccard 381 days ago

> Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

Not indefinitely or for any undetermined scale, but AWS regularly subsidise up to 100k [0] in credits. It would not surprise me in the slightest if most. Inference is much cheaper than training and 100k in compute covers a decent amount of usage. Activate is tiered over 3 years so if you want to know the full story, let’s see how many of these services are still around in 18 months. I suspect just like when Games were the flavor of the month, then Crypto, we’ll see the real story when they actually have to pay a bill and their investors aren’t seeing any growth

[0] https://aws.amazon.com/activate/activate-landing/

JimDabell 380 days ago

I added “indefinitely” precisely because I wanted to rule out discussion of the free credits. Those are clearly a loss-leader to get people to choose AWS and isn’t relevant to how the true cost of inference.

maccard 380 days ago

The point is that all of these projects are only viable when salaries are VC funded and the opex of inference is close to 0. It’s easy to say that nobody will subsidise inference if you exclude the main subsidies

dragontamer 381 days ago

Purchasing new GPUs is capex but depreciation of GPUs is opex.

There's still a cost, it's just thrown into the future.

burnte 381 days ago

Capex and opex are just accounting labels that help categorize costs and can improve planning ability. But at the end of the day a billion dollars is a billion dollars.

everforward 381 days ago

They’re significant here because opex impacts profits while capex sort of doesn’t. They have a path to profitability if revenue > opex, by quitting growth and slashing capex.

Lots of hand waving, but that’s the idea.

codyb 381 days ago

I believe there's a fair amount of tax implications involved with that bucketing though. Capex is taxed at a lower rate than opex is my understanding but I may be wrong on the specifics of it all.

ceejayoz 381 days ago

> You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.

lhl 381 days ago

Rather than speculating another option is to just measure things. I churned through billions of tokens for evals and synthetic data earlier this year, so I did some of that. On an H100 node, a Llama3 70B FP8 at concurrency=128 generated at about 0.4 J/token (this was estimating node power consumption and multiplying by a generous PUE, 1.2X or something like that) - it was still 120X cheaper than the 48 J/token estimates of cost to run the 175B GPT-3 on 2021-era Microsoft DC1 hardware (Li et al. 2023) and 10X cheaper than the 3-4 J/token empirical measurements to run LLaMA-65B on V100/A100 HPC nodes (Samsi et al 2023).

Anyway, at 0.4 J/token, at a cost of 5 cents/kWh, is about 0.5 cents/million tokens. Even at 50% utilization you're only up to 1.1 cents/M tokens. Artificial Analysis reports the current average price of Llama3.3 70B to be about $0.65/M tokens. I'd assume most of the cost you're paying for is probably the depreciation schedule of the hardware.

Note that of course, modern-day 7B class models stomp on both those older models so you could throw in another 10X lower cost if you're going to quality adjust. Also, I did minimal perf tuning - I used FP8, and W8A8-INT8 both is faster and has slightly better quality (in my functional evals). I also used -tp 8 for my system. -tp 4 w/ model parallelism and cache-aware routing you should also be able to increase throughput a fair amount. Also, speculative decode w/ a basic draft model would give you another boost. And this was tested at the beginning of the year, so using vLLM 0.6.x or so - the vLLM 1.0 engine is faster (better graph building, compilation, scheduling). I'd guess that if you were conscientious about just optimizing you could probably get at least another 2X perf free with basically just "config".

frotaur 381 days ago

My only question about this is the concurrency : is it really easy to leverage it when you need to serve to clients without much latency ? I don't know much about this.

lhl 381 days ago

Yeah, actually for my batch usage, I usually push to 256+ concurrency, but on H100s at least, currently 64-128 is about the bend of the curve for where latency starts going out of control (this depends a lot on your context length and kvcache optimizations, though).

What I do for testing is that I will run a benchmark_serving sweep (I prefer ShareGPT for a standard set that is slightly more realistic for caching) with desired concurrency (eg 4-1024 or something like that) and then plot TTFT vs Total Throughput and graph Mean, P50, and P99 - this will give you a clear picture what your concurrency/throughput for a given desired latency.

ceejayoz 381 days ago

Yes, if we discount the billion or so Facebook spent to train Llama3.

lhl 381 days ago

No, let's add it. The cost for an inference provider to deploy a trained and weights available existing model is $0 (or whatever you want to add for the HF download of the weights). Open weight models simply exist now. Deal with it?

If you would like to someone add that somehow as a line item, perhaps you should add the full embodied energy cost of Linux (please include the entire history of compute since it wouldn't exist without UNIX), or perhaps the full military industrial complex costs from the invention of the transistor? We could go further.

timschmidt 381 days ago

I love it! Can't forget the accumulated carbon costs of all the experimentation it took to master fire, ceramics, and metals smelting.