| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xxbondsxx 381 days ago

You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).

Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).

(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)

15 comments

JimDabell 381 days ago

> you also don't have any evidence that they are profitable.

Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

Yes, capex not opex. The cost of running inference is opex.

antman 381 days ago

No we don't, MS used their OpenAI position as a strategy to increase Azure adoption. I am surprised AWS didn't give ls for free

wkat4242 380 days ago

Microsoft's plans for openai are much bigger than just Azure. They have 30 products/services with copilot in the name already. And they're pushing them like crazy to end users.

bee_rider 381 days ago

> Yes, capex not opex. The cost of running inference is opex.

This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.

Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?

So, is it really obvious what’s capex and what’s opex? In particular:

* The nodes used for training are rented, so that’s opex, right?

* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.

JimDabell 381 days ago

> The nodes used for training are rented, so that’s opex, right?

It’s capex. They are putting money in, and getting an asset out (the weights).

> The models are in some sense consumable?

Assets depreciate.

bee_rider 381 days ago

Obsolete software don’t depreciate like obsolete hardware. If an LLM company has trained a truly better model, they can simply make as many copies of their own model as they want. Thus, if the new model is truly better in every way, the old one is completely valueless to them (of course there might be some tradeoffs which mean older models can stick around because they are, say, smaller… but, ultimately they will be valueless after some time).

Because models are still being obsoleted every couple years, old models aren’t an asset. They are an R&D byproduct.

qeternity 381 days ago

> the old one is completely valueless to them

This is of course untrue for the same reason that people are still running Windows 2000.

bee_rider 381 days ago

> This is of course untrue for the same reason that people are still running Windows 2000.

What is the reason?

nine_k 381 days ago

> when LLMs become really useful

It looks to me similar to the situation with that newly fashionable WWW thing in, say, 1998. Everybody tried to use it, in search of some magic advantage.

Take a look at the WWW heavyweights today: say, Amazon, Google, Facebook, TikTok, WeChat. Are the web technologies essential for their success? Very much so. But TCP/IP + HTML + CSS + JS are mere tools that enable their real technical and business advantages: logistics and cloud computing, ad targeting, the social graph, content curation for virality, strong vertical integration with financial and social systems, and other such non-trivial things.

So let's wait until a killer idea emerges for which LLMs are a key enabler, but not the centerpiece. Making an LLM the centerpiece is the same thinking that was trying to make catchy domain names the centerpiece, leading to the dot com crash.

rco8786 381 days ago

AWS isn’t doing the training on those models.

JimDabell 381 days ago

OpenAI spends less on training than inference, so the worst case scenario is less than double the cost after factoring in training. Inference is still cheap.

rco8786 381 days ago

Inference is cheap. Training is cheaper. Then where's all the money going? OpenAI is reporting heavy losses, but you're saying the unit economics of inference are all good. What are they spending money on?

jsnell 381 days ago

Their spending is not a problem. It's quite low for a top-tier hard tech company that's also running a consumer service with 500M active users. They are making a loss because 95% of their users are on free accounts, and for now they're choosing not to monetize those users in any way (e.g. ads).

raydev 381 days ago

sama tweeted that the $200 tier was priced too low to cover costs a few months ago.

mediaman 381 days ago

Salary, mostly. It's useful to separate out the GPU cost of training from the salary cost of the people who design the training systems. They are expensive.

That does not mean, however, that inference is unprofitable. The unit economics of inference can be profitable even while the personnel costs of training next-generation models are extraordinary.

JimDabell 380 days ago

> Then where's all the money going?

They are giving vast amounts of inference away as part of their free tier to gain market share. I said inference is cheap, not that it is free. Giving away a large amount of a cheap product costs money.

> you're saying the unit economics of inference are all good

Free tiers do not contradict positive unit economics.

ahtihn 381 days ago

Salaries?

derangedHorse 381 days ago

This is not generally true. Inference costs have just began to spike starting with the 'test-time scaling' trend[1]. I imagine most OpenAI users are free and the mini models available to them only cost a few cents per task[2]. The chart from The Information featured in this Reddit thread seems more reasonable[3].

Although that was posted in October, so not much time for the reasoning model costs to show up. It's also important to note their revenue is on track to more than double this year[4] and one can't make a complete picture without understanding the revenue spent on the inference provided by these reasoning models.

[1] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[2] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[3] https://www.reddit.com/r/singularity/comments/1g0acku/someho...

[4] https://techcrunch.com/2025/06/09/openai-claims-to-have-hit-...

Gerardo1 381 days ago

> You think AWS are going to subsidies your usage of somebody else’s models

Yes

>indefinitely?

No, and that's the point.

Spivak 381 days ago

Have they ever actually done this? I can't think of a time they've actually raised their prices ever that isn't the Route53 passing on registrar costs.

sokoloff 381 days ago

They started charging for public IP addresses in early 2024, which was a price increase from zero.

Gerardo1 380 days ago

That assumes that they've been similarly situated with an offering that isn't profitable and has no path to profitability.

What company selling a primarily AI-based service right now is making a profit on that service?

maccard 381 days ago

> Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

Not indefinitely or for any undetermined scale, but AWS regularly subsidise up to 100k [0] in credits. It would not surprise me in the slightest if most. Inference is much cheaper than training and 100k in compute covers a decent amount of usage. Activate is tiered over 3 years so if you want to know the full story, let’s see how many of these services are still around in 18 months. I suspect just like when Games were the flavor of the month, then Crypto, we’ll see the real story when they actually have to pay a bill and their investors aren’t seeing any growth

[0] https://aws.amazon.com/activate/activate-landing/

JimDabell 380 days ago

I added “indefinitely” precisely because I wanted to rule out discussion of the free credits. Those are clearly a loss-leader to get people to choose AWS and isn’t relevant to how the true cost of inference.

maccard 380 days ago

The point is that all of these projects are only viable when salaries are VC funded and the opex of inference is close to 0. It’s easy to say that nobody will subsidise inference if you exclude the main subsidies

dragontamer 381 days ago

Purchasing new GPUs is capex but depreciation of GPUs is opex.

There's still a cost, it's just thrown into the future.

burnte 381 days ago

Capex and opex are just accounting labels that help categorize costs and can improve planning ability. But at the end of the day a billion dollars is a billion dollars.

everforward 381 days ago

They’re significant here because opex impacts profits while capex sort of doesn’t. They have a path to profitability if revenue > opex, by quitting growth and slashing capex.

Lots of hand waving, but that’s the idea.

codyb 381 days ago

I believe there's a fair amount of tax implications involved with that bucketing though. Capex is taxed at a lower rate than opex is my understanding but I may be wrong on the specifics of it all.

ceejayoz 381 days ago

> You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.

lhl 381 days ago

Rather than speculating another option is to just measure things. I churned through billions of tokens for evals and synthetic data earlier this year, so I did some of that. On an H100 node, a Llama3 70B FP8 at concurrency=128 generated at about 0.4 J/token (this was estimating node power consumption and multiplying by a generous PUE, 1.2X or something like that) - it was still 120X cheaper than the 48 J/token estimates of cost to run the 175B GPT-3 on 2021-era Microsoft DC1 hardware (Li et al. 2023) and 10X cheaper than the 3-4 J/token empirical measurements to run LLaMA-65B on V100/A100 HPC nodes (Samsi et al 2023).

Anyway, at 0.4 J/token, at a cost of 5 cents/kWh, is about 0.5 cents/million tokens. Even at 50% utilization you're only up to 1.1 cents/M tokens. Artificial Analysis reports the current average price of Llama3.3 70B to be about $0.65/M tokens. I'd assume most of the cost you're paying for is probably the depreciation schedule of the hardware.

Note that of course, modern-day 7B class models stomp on both those older models so you could throw in another 10X lower cost if you're going to quality adjust. Also, I did minimal perf tuning - I used FP8, and W8A8-INT8 both is faster and has slightly better quality (in my functional evals). I also used -tp 8 for my system. -tp 4 w/ model parallelism and cache-aware routing you should also be able to increase throughput a fair amount. Also, speculative decode w/ a basic draft model would give you another boost. And this was tested at the beginning of the year, so using vLLM 0.6.x or so - the vLLM 1.0 engine is faster (better graph building, compilation, scheduling). I'd guess that if you were conscientious about just optimizing you could probably get at least another 2X perf free with basically just "config".

frotaur 381 days ago

My only question about this is the concurrency : is it really easy to leverage it when you need to serve to clients without much latency ? I don't know much about this.

lhl 381 days ago

Yeah, actually for my batch usage, I usually push to 256+ concurrency, but on H100s at least, currently 64-128 is about the bend of the curve for where latency starts going out of control (this depends a lot on your context length and kvcache optimizations, though).

What I do for testing is that I will run a benchmark_serving sweep (I prefer ShareGPT for a standard set that is slightly more realistic for caching) with desired concurrency (eg 4-1024 or something like that) and then plot TTFT vs Total Throughput and graph Mean, P50, and P99 - this will give you a clear picture what your concurrency/throughput for a given desired latency.

ceejayoz 381 days ago

Yes, if we discount the billion or so Facebook spent to train Llama3.

lhl 381 days ago

No, let's add it. The cost for an inference provider to deploy a trained and weights available existing model is $0 (or whatever you want to add for the HF download of the weights). Open weight models simply exist now. Deal with it?

If you would like to someone add that somehow as a line item, perhaps you should add the full embodied energy cost of Linux (please include the entire history of compute since it wouldn't exist without UNIX), or perhaps the full military industrial complex costs from the invention of the transistor? We could go further.

timschmidt 381 days ago

I love it! Can't forget the accumulated carbon costs of all the experimentation it took to master fire, ceramics, and metals smelting.

Palmik 381 days ago

> API that is likely a loss-leader to grab market share (hosted LLM cloud models).

I don't think so, not anymore.

If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.

As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.

[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.

lambda 381 days ago

You're leaving out their training costs. And while you might say "well, once they're trained you don't have to spend more on that", but as we've seen they have to keep training new models on new data, such as current events and new language features and APIs. And some aspects of that training are becoming more costly, or more scarce, as companies like Reddit and Stackoverflow restrict and sell their data, less data gets produced on Stackoverflow as people switch to using LLMs instead, website operators go to more extreme measures to block AI scrapers that ignore robots.txt, etc.

Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?

Xmd5a 381 days ago

I use whisper to transcribe long conversations, and deploying the model myself on vastai is ten times cheaper than OpenAI's API offer.

indigodaddy 380 days ago

I’m assuming doing transcription on a vast GPU is also ten+ times faster than local options?

https://news.ycombinator.com/item?id=44225953

noodletheworld 381 days ago

I don’t completely disagree, but “assertion one” [1]

[1] ~ you can obviously verify this yourself by doing it yourself and seeing how expensive it is.

…is an enormously weak argument.

You suppose. You guess. We guess.

Let’s be honest, you can just stop at:

> I don’t think so.

Fair. I don’t either; but that’s about all we can really get at the moment afaik.

Palmik 381 days ago

No, the point of [1] is that this is not some "secret knowledge". My response is based on running models in production and comparing my costs with the costs I would pay to API providers running the same models.

vslira 381 days ago

he's not wrong, if you can run a open weights model in any cloud, you can very straightforwardly estimate the cost of running the model. considering that these providers either use long-term contracts or maybe even buy their own hardware, this theoretical cloud deployment is itself an overestimate of the costs

noodletheworld 380 days ago

…and its perfectly legit to run that, write the numbers down and link to it.

But:

A) it makes absolutely no difference to the fact you have no idea what the big LLM providers are actually doing.

B) Just asserting some random thing and saying “anyone competent can verify this themselves” is a weak argument. Youre saying youve done the research, but failing to provide any evidence you actual have

If youve crunched the numbers then man up and post them.

If not, then stop at “I think…”

“This is based on my experience running production workloads…” is a nice way of saying “I dont have any data to backup what Im saying”.

If you did, you could just link to it.

…by not posting data you make your argument non-falisifyable.

It is just an oppinion.

xxbondsxx 381 days ago

For example, Perplexity has been fudging their accounting numbers to shift COGS to R&D to make their margin appear profitable: https://thedeepdive.ca/did-perplexity-fudge-its-numbers/

pama 381 days ago

Please read the DeepSeek analysis of their API service (linked in this article): they have 500% profit margin and they are cheaper than any of the US companies serving the same model. It is conceivable that the API service of OpenAI or Anthropic have much higher profit margins yet.

(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)

hedayet 381 days ago

That claim actually gives me pause. It reminds me of an idea from Zero to One by Peter Thiel - that real monopolies like to appear as a small fish in a very big pond, while tiny players try to appear as a monopoly.

So when I see a company bragging about "500% profitability," I can’t help but wonder if they’re even profitable at all.

withinboredom 381 days ago

I imagine pretty much none of them are profitable in the real accounting sense. However, if they all turned off their free plans -- they'd be insanely profitable.

pama 381 days ago

Please read their report. There is no bragging. It just tries to document performance and clarify a misconception. The concept that LLM inference may not be profitable or may be energy inefficient has been a constant song of misinformation for reasons that I dont understand. DeepSeek does indeed pretend to be of similar quality to others, but the work of their relatively small team is truly outstanding. As per a parallel thread, their result has by now been almost replicated by the sglang team. Link here: https://lmsys.org/blog/2025-05-05-large-scale-ep/

SEGyges 381 days ago

Every LLM provider caches their KV-cache, it's a publicly documented technique (go stuff that KV in redis after each request, basically) and a good engineering team could set it up in a month.

chipsrafferty 380 days ago

Are you saying if I ask a prompt "foo" and then a month later another user asks "foo" then it retrieves a cached value?

wkat4242 380 days ago

No, the key value cache is the context in a way the model can read it.

iamnotagenius 381 days ago

With all due respect to Deepseek, I would take their numbers with grain of salt, as they might as well be politically motivated.

jarym 381 days ago

Any more politically motivated than a model from anywhere else?

pama 381 days ago

The current version of sglang allows inference with the R1 model at a cost that is very close to the rate that DeekSeep claimed (using H100s, not exactly the DeepSeek compute). Their claim is almost validated by replication at this point so there is nothing left to take with a grain of salt other than the possibility that there exists potentially an even higher margin than what they claimed if one were to optimize for modern NVidia hardware.

WithinReason 381 days ago

is that better or worse than commercially motivated?

leeoniya 381 days ago

commercial motivatation needs to show eventual profit to be sustainable, while political does not.

though at the outset (pre-profit / private) it's hard to say there's much difference.

bee_rider 381 days ago

> though at the outset (pre-profit / private) it's hard to say there's much difference.

I think this is the tough part, we’re at the outset still.

Also, a political investment could could be sustainable, in the sense that China might decide they are fine running Deepseek at a loss indefinitely, if that’s what’s going on (hypothetically. Actually I have never seen any evidence to suggest Deepseek is subsidized, although I haven’t gone looking).

lazide 381 days ago

Also, solar panel dumping as a quite successful example (on many, many fronts).

raincole 381 days ago

> an API that is likely a loss-leader to grab market share (hosted LLM cloud models)

Everyone just repeats this but I never buy it.

There is literally a service that allows you to switch models and service providers seamlessly (openrouter). There is just no lock-in. It doesn't make any financial sense to "grab market share".

If you sell something with UI, like ChatGPT (the web interface) or Cursor, sure. But selling API at a loss is peak stupidity and even VCs can see that.

zdp7 381 days ago

You can't switch to a competitor that went out of business. If you low ball your rates, it starves startups of needed funds

raxxorraxor 380 days ago

Depends on the business case. LLM slowly creep into several workflows and here determinism becomes more important than the latest abilities in reasoning.

People start to let their LLM parse text content. Be that mails, chats or transcriptions, the models often need to formalize their output and switching models can become burdensome, while developers might switch models on a whim.

Doesn't mean you can capture a market by selling cheap though.

mupuff1234 381 days ago

Except they most likely do have a plan to make it harder to switch.

DarmokJalad1701 381 days ago

Who is "they"? It makes no sense for Openrouter to allow providers that do not conform to the API. They profit from the commission from the fees and not providing inference.

raincole 381 days ago

Yeah, sure, please elaborate on how providers such as Fireworks, DeepInfra, Chutes are going to "make it harder to switch."

mupuff1234 381 days ago

I'm talking about openAI, anthropic, Google, etc.

They'll offer consumer and enterprise integrations that will only work with their models.

hedayet 381 days ago

yes. And they will try both carrots and sticks.

The carrots are already visible - think abstractions like "projects" in ChatGPT.

julianeon 381 days ago

This is also the argument of the guy in the article, fyi (it's not a loss leader, no reason for it to be).

cush 381 days ago

> You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).

Regardless of maturity lifecycle, by definition loss-leaders are cheap. If I go to the grocery store and milk is $1, I don't think I'm being swindled. I know it's a loss-leader and I buy it because it's cheap.

We are currently in the early-Netflix-massive-library-for-five-dollars-a-month era of LLMs and I'm here for it. Take all you can grab right now because prices will 100x over the next two years.

jackdeansmith 381 days ago

Want to bet? I'll give you 5:1 odds that tokens from a model with some specific benchmark performance (we can sort out the specific benchmarks or basket of benchmarks if you want to bet) will be cheaper two years from now.

cush 381 days ago

Sure. To clarify, I'm not asserting that in two years today's 4o will be 100x more expensive, but the sum of many core offerings from various companies will be. It won't be unheard of for people to spend $2k-$10k/yr between many AI services

chipsrafferty 380 days ago

It's not unheard of for people to spend $13,000 on a pure silver frying pan. Is it common? No.

cush 379 days ago

Like say 10% of people or more

jackdeansmith 380 days ago

I misunderstood your comment then, my assertion is that models which have the same capabilities as current models will be cheaper in the future. I have no doubts that models with more capabilities will be more expensive.

jstummbillig 381 days ago

There is also a lot of different models at a lot of different price points (and LLMs are fairly hard to compare to begin with). In this theory of a likely loss-leader, must we assume that all of them, from all companies, are priced below cost...? If so, that seems like a fairly wild claim. What's Step 2 for all of these companies to get ahead of this, given how model development currently works?

I think the far more reasonable assumption is: It's profitable enough to not get super nervous about the existence of your company. You have to build very costly models and build insanely costly infrastructure. Running all of that at a loss without an obvious next step, because ALL of them are pricing to not even make money at inference, seems to require a lot of weird ideas about how companies are run.

otterley 381 days ago

We’ve seen this pattern before. This happened in the 1990s during the original dot-com boom. Investors gamble, everything is subsidized, most companies fail, and the ones left standing then raise prices.

dietr1ch 381 days ago

I don't think it's that wild. Hardware will improve together with performance, but once the market stops expanding and behaviour gets stagnant the market shares will solidify, so you better aim to have a large portion to make the scale together with the improvements help reach profitability.

int_19h 381 days ago

The problem with this theory in general is that, given the sheer number of cloud inference providers (most of which are hosting third party models), it would be exceedingly strange if not only all of them are engaging in this same tactic, but apparently all of them have the same financial capacity to do so.

ddp26 381 days ago

I analyzed OpenAI API profitability in summer 2024 and found inference for gpt-4 class models likely pretty profitable, ~50% gross margins (ignoring capex for training models): https://futuresearch.ai/openai-api-profit

otterley 381 days ago

That’s a little like saying you can compute the profitability of the energy market by looking only at the margins of gas stations. You can’t exclude all the outlays on actually acquiring the product to sell.

lazide 381 days ago

Sure - but is there any doubt in that example that gas stations are making a profit?

And unlike gasoline, once models are trained there is no significant ongoing production cost.

otterley 381 days ago

Models aren't static. In order for them to remain relevant, they have to be constantly retrained with new data. Plus there's a model arms race going on and which will probably continue for the foreseeable future.

lazide 380 days ago

Fair point - though various distilling and retraining tricks do reduce the cost quite a bit. It’s not like everyone is doing all the work they had to do from scratch, every time.

lumost 381 days ago

We don’t know what the marginal cost of inference is yet however. So far, users are demonstrating that they are willing to pay more for LLMs than traditional web experiences.

At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.

jongjong 381 days ago

Yep spot on. Price does not equate cost. Especially in our current economy where profit has been artificially made a non-factor. To know the cost, you'd have to look at hardware resource usage per query. Given that recent models have over a trillion parameters, you need a huge amount of memory and CPU to process a query to get the electrons to traverse all these thousands of billions of ANN nodes and/or weights.

Ultimately, it may turn out that dumber models may be more economically efficient than smarter models once you ignore the investment subsidy factor.

Maybe, given the current state of AI, the economically efficient situation is to have lots of dumb LLMs to solve small, well-defined problems and leave the really difficult problems to humans.

Current approach, looking at pricing is assuming another AI breakthrough is just around the corner.

TZubiri 381 days ago

This is addressed in the article. Giving arguments for llms being profitable as APIs.

n4r9 381 days ago

One of those arguments is:

> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly

The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).

Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.

unilynx 381 days ago

> Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.

I'm sure they've already found ways to do that, injecting relevant ads is just a form of RAG.

But they won't risk it yet as long as they're still grabbing market share just like Google didn't run them at the start - and kept them unobtrusive until their search won.

pr337h4m 381 days ago

Uber and Lyft rely on network effects, which do not exist in any meaningful sense for LLM API providers.

n4r9 380 days ago

Yeah, that's definitely a factor in the attempt to "undercut and outlast". I guess I have two defenses: firstly, network effects might not be crucial, it might be enough for there to be a small cost to changing provider; secondly, I imagine the providers are finding ways to use network effects to bolster adoption - e.g. "Find me a party date when all my friends are free, book the catering and message them with invites".

nitwit005 381 days ago

Brand is huge in every market. It's hard to get people to visit your website at all. People know about OpenAI, and look it up.

TZubiri 380 days ago

No network effect + already profitable.

Not at all like Uber, let it go

nitwit005 380 days ago

Incoherent response.

username223 381 days ago

It's addressed poorly.

> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,

What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.

See above. Just like any other Cloud service, you tie clients to your API.

> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.

80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)

Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.

jsnell 381 days ago

> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

That's not really how the LLM API market works. The interfaces themselves are pretty trivial and have no real lock-in value, and there's plenty of adapters around anyway. (Often first-party, e.g. both Anthropic and Google provide OpenAI-compatible APIs). There might initially have been theories that you could not easily move to a different model, creating lock-in, but in practice LLMs are so flexible and forgiving about the inputs that a different model can be just dropped in an work without any model-specific changes.

> 80% margin on GPU cost? What about after paying for power, facilities

The market price of renting that compute on the market. That's fully loaded, so would include a) pro-rated recouping the capital cost of the GPUs, b) the power, cooling, datacenter buildings, etc, c) the hosting provider's margin.

> admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

Pretty likely! In OpenAI's leaked 2024 financial plan the compute costs were like 75% of their projected costs.

jongjong 381 days ago

Yep, agreed, it's quite different with LLMs since the endpoints are very straightforward.

It's kind of unfair how little lock in factor there is at the base layer. Those doing the hardest, most innovative work have no way to differentiate themselves in the medium or long run. It's just unlikely that one person or company will keep making all the innovations. There is an endless stream of newcomers who will monetize on top of someone else's work. If anyone obtains a lock-in, it will not be through innovation. But TBH, it kind of mirrors the reality of the tech industry as a whole. Those who have been doing the innovation tend to have very little lock in. They are often left on the streets. In the end, what counts financially is the ability to capture eyeballs and credit cards. Innovation only provides a temporary spike.

With AI, even for a highly complex system, you'll end up using maybe 3 API endpoints; one for embeddings, one for inference and one for chat... You barely need to configure any params. The interface to LLMs is actually just human language; you can easily switch providers and take all your existing prompts, all your existing infra with you... Just change the three endpoint names, API key and a couple of params and you're done. Will take a couple of hours at most to switch providers.

username223 381 days ago

> The market price of renting that compute on the market. That's fully loaded,

Sorry, I totally misread your post. Charging 80% on top of server rental isn't so bad, especially since I'm guessing there are significant markups on GPU rental given all the AI demand.

petesergeant 381 days ago

> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

Have you used any of these APIs? There's very little lock-in for inference. This isn't like setting up all your automation on S3, if you use the right library it's changing a config file.

Workaccount2 381 days ago

Just wait till there are ads for free users, which is going to happen. Depending on how insidious these ads are, they could be extremely profitable too, like recommending products and services directly in context.

Sevii 381 days ago

They could dynamically update the system prompt with ad content on a per request basis. Lots of options.

slt2021 381 days ago

most likely you will be targeted with ads based on what you give to the model. if you ask chatgpt about electric cars, expect a wave of ads coming at you from EV automakers from all channels: socials, media, email, mail, etc - trying to close you on their car brand

handfuloflight 381 days ago

Why do you equate contextual with insidious?

loudmax 381 days ago

The OP is not equating contextual with insidious. They're pointing out, correctly, that contextual ads can be insidious. And if they're profitable, they probably will be.

A lot of the companies offering LLM services are in a race gain market share and build expertise. Right now, they can burn through millions of dollars of VC money, with the expectation that they'll turn a profit at some point in the future. If that profit comes from advertising, and critically, if users don't expect advertising in their free LLMs, because they didn't see ads in generated output in the past, that will be very insidious.

handfuloflight 381 days ago

> If that profit comes from advertising, and critically, if users don't expect advertising in their free LLMs, because they didn't see ads in generated output in the past, that will be very insidious.

Are the free LLM providers offering their service with a contractual obligation to the users that they will not add advertising to the outputs? If not, how is it insidious?

What definition of insidious are you using per https://www.merriam-webster.com/dictionary/insidious?

wredcoll 381 days ago

Weirdly, no part of that merriam-webster link includes the word "contract". I'm not sure you know how words work.

handfuloflight 381 days ago

Why does it need to include the word "contract"?

Workaccount2 381 days ago

Because then the AI isn't working for you anymore, it's working for the advertisers. Which isn't necessarily bad, but we can be pretty confident that the AI will not be upfront about this, and instead try to act like it's working for you.

handfuloflight 381 days ago

If the advertising is contextually relevant, how is it working against you?

pigeons 381 days ago

Just being contextually relevant doesn't mean its in your interests as opposed to in the interests of the advertised or that the levers are transparent.

handfuloflight 381 days ago

Are you assuming all commercial relationships are adversarial? Why can't advertisers and those advertised to have aligned interests? What transparent levers do non-advertised results have, how do you know search rankings don't have hidden commercial incentives? Why trust undisclosed bias over disclosed relationships? Isn't transparency about incentives better than pretending they don't exist?

luqtas 381 days ago

+ not considering the amount of copyright violations on the training weights, if it was easy and cheap for the masses to use the judiciary system, maybe this technology would be way behind of what it's "capable"

ozim 381 days ago

I think you can make an educated guess if you check local model performance, prices of energy and hardware and price of the subscriptions.

Best part is you can make perplexity research task out of it