Hacker News new | ask | show | jobs
by username223 381 days ago
It's addressed poorly.

> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,

What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.

See above. Just like any other Cloud service, you tie clients to your API.

> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.

80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)

Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.

2 comments

> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

That's not really how the LLM API market works. The interfaces themselves are pretty trivial and have no real lock-in value, and there's plenty of adapters around anyway. (Often first-party, e.g. both Anthropic and Google provide OpenAI-compatible APIs). There might initially have been theories that you could not easily move to a different model, creating lock-in, but in practice LLMs are so flexible and forgiving about the inputs that a different model can be just dropped in an work without any model-specific changes.

> 80% margin on GPU cost? What about after paying for power, facilities

The market price of renting that compute on the market. That's fully loaded, so would include a) pro-rated recouping the capital cost of the GPUs, b) the power, cooling, datacenter buildings, etc, c) the hosting provider's margin.

> admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

Pretty likely! In OpenAI's leaked 2024 financial plan the compute costs were like 75% of their projected costs.

Yep, agreed, it's quite different with LLMs since the endpoints are very straightforward.

It's kind of unfair how little lock in factor there is at the base layer. Those doing the hardest, most innovative work have no way to differentiate themselves in the medium or long run. It's just unlikely that one person or company will keep making all the innovations. There is an endless stream of newcomers who will monetize on top of someone else's work. If anyone obtains a lock-in, it will not be through innovation. But TBH, it kind of mirrors the reality of the tech industry as a whole. Those who have been doing the innovation tend to have very little lock in. They are often left on the streets. In the end, what counts financially is the ability to capture eyeballs and credit cards. Innovation only provides a temporary spike.

With AI, even for a highly complex system, you'll end up using maybe 3 API endpoints; one for embeddings, one for inference and one for chat... You barely need to configure any params. The interface to LLMs is actually just human language; you can easily switch providers and take all your existing prompts, all your existing infra with you... Just change the three endpoint names, API key and a couple of params and you're done. Will take a couple of hours at most to switch providers.

> The market price of renting that compute on the market. That's fully loaded,

Sorry, I totally misread your post. Charging 80% on top of server rental isn't so bad, especially since I'm guessing there are significant markups on GPU rental given all the AI demand.

> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

Have you used any of these APIs? There's very little lock-in for inference. This isn't like setting up all your automation on S3, if you use the right library it's changing a config file.