| HN Mirror

blahblaher 31 days ago

"It costs OpenAI less money to serve GPT-5.5 than GPT-4." does it though? do you have the numbers? Or you just making stuff up?

[flagged]

ralusek 31 days ago

We used to not know, but now because open source models are being hosted and served by people whose only incentive is making profit on directly running inference, we have a ballpark idea.

There's no reason to think that the latest frontier models have similar inference costs to open source models.

It would be more surprising if the surrounding architecture hasn't significantly diverged. If it _hasn't_ significantly diverged, then given the performance difference it would imply that the frontier models have significantly greater param counts, which would result in a higher cost.

sarchertech 31 days ago

No we have no idea that the open source inference market isn’t being kept artificially low because some of the operators are operating a loss hoping to gain market share. All it takes is a few and everyone else has to lower prices to compete while they hope for lower costs and subsidies to dry up.

We also have to assume that these operators are correctly pricing GPU depreciation, and the market is so new there is no reason to believe they are.

https://simianwords.bearblog.dev/conclusive-proofs-that-llm-...

simianwords 31 days ago

GPT-4 (original API):

Input: $30 / 1M tokens

Output: $60 / 1M tokens

GPT-5.5:

Input: $5 / 1M tokens

Output: $30 / 1M tokens

Costs have been reducing by over 5x year over year. Inference cost concern is mostly performative.

Edit: can't reply but companies aren't selling inference at loss. In the blog post I point to third party hosting of open models like Deepseek which are also going down. They are not VC backed.

I also point to Gemma 31B which you can run on your laptop today that beats most models from 2024.

zamalek 31 days ago

What they charge people says nothing about what it costs them. Off the top of my head, one confounding factor is trying to win back marketshare from Anthropic.

We will only know the actually situation once Anthropic goes public and we can look at their books.

XenophileJKO 31 days ago

"Neither Mr. Edison nor anyone else can override the well-known laws of Nature, and when he is made to say that the same wire which brings you light will also bring you power and heat, there is no difficulty in seeing that more is promised than can possibly be performed. To talk about cooking food by heat derived from electricity is absurd."

stavros 31 days ago

Wait, this person knew that the wire could bring you light, but not that it could bring you heat? Hadn't they noticed that light bulbs heat up?

rcxdude 31 days ago

It could be a reasonable argument from the point of view of scale: you need a lot more energy for cooking than for lighting (even with incandescent lightbulbs, though they were a fair bit dimmer and colder in the earlier days of them).

Gooblebrai 31 days ago

Good quote. Doesn't apply well to this situation tho.

rafaelero 31 days ago

I think it's pretty safe to assume they are not losing money on inference.

multjoy 31 days ago

I think it’s safe to assume that they are bleeding cash.

basilgohar 31 days ago

Based on what? They haven't even IPOed.

It's silicon valley and they are trying to aggressively grow. Your baseline assumption should be the exact opposite.

The price a company charges, _particularly_ a high growth VC-backed one, is a poor signal for their costs.

That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.

simianwords 31 days ago

> That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.

Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox. Deepseek is much cheaper and way more performant. If you can agree to reasonable assumptions

1. that closed source models are more efficient than open source

2. Deepseek is served at a profit and not a loss

Then it is pretty clear that the prices have gone down. If the prices have gone down more than 20x-30x then surely it is not _still_ subsidised is it?

I think this amount of skepticism is not warranted here. Every reasonable explanation or proxy is met with "but you don't know what they really do" is naive.

It is borderline conspiratorial to believe it this way.

Den_VR 31 days ago

I don’t find it at all reasonable that closed source models are more efficient. The people involved had different circumstances and it naturally affects their work

> 1. that closed source models are more efficient than open source

Not a reasonable assumption for a variety of reasons.

> 2. Deepseek is served at a profit and not a loss

Not a reasonable assumption either.

> Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox.

Because the internals are what actually matter and what drives inference cost.

It would be entirely reasonable to expect that GPT-5.5 has some sort of optimizations or changes to the architecture to make it easier to train, or to make runtime ablation easier, or to better handle large batches, or whatever.

Those changes, particularly if they are non-public, can easily result in worse inference performance than a comparably sized model without those changes.

> It is borderline conspiratorial to believe it this way.

It's not any sort of conspiracy. It's how land-grab tech companies have always worked. To presume otherwise is silly.

Ygg2 31 days ago

That's pricing.

Pricing has no correlation with profit. It can be artificially lowered to kill competition, and artificially inflated to maximize profit.

philipallstar 31 days ago

It definitely correlates with profit. It doesn't correlate with cost, at least when you have VC money to burn.

IncRnd 31 days ago

If you go to https://developers.openai.com/api/docs/pricing, you will see the actual prices, which do not match what you posted:

GPT-4.1 Input: $2.00 / 1M Tokens Output: $8.00 / 1M Tokens

raincole 31 days ago

The parent comment is correct. They are talking about GPT-4, which was really expensive by today's standard. After GPT4o came out, GPT-4 was completely forgotten.

stavros 31 days ago

Yeah, even back then, ~nobody was using GPT-4 because it was released as some weird Sam Altman flex. Super expensive, not that capable.

toasty228 31 days ago

Datacenter GPUs pinned to 100% won't make it to their 3rd anniversary, models are getting larger and larger, they get smarter by running longer "reasoning" loops, there is no indication that it'll get better soon.

> Open source models are 3-6 months behind.

On the benchmarks included in their training set yes, not in real life

nicce 31 days ago

This is only true if there is enough competition with equally good SOTA models. Otherwise, the price of the best models will keep increasing until people don't buy them anymore and use humans instead. Regardless of how much it costs to operate in reality. There is a reason why non-profit unnamed company turned to profit company.

raincole 31 days ago

It's just like saying every dependency is a ticking bomb. In a very strict sense, it's true. But it really doesn't matter for most businesses (and absolutely doesn't matter for early stage startups.)

drzaiusx11 31 days ago

Depends on the domain really, along with you and your user's aversion to risk. On average I'd agree your take holds true though.

gjsman-1000 31 days ago

> sad dark HN loser path

Assertion assertion assertion wishful thinking assertion.

Show, don't tell. Show us that we're wrong and this isn't a VC black hole. The CEO of Enron as late as September 2001 could've called every critic a sad dark loser with nobody challenging him publicly. Jim Cramer famously yelled anyone pulling their money from Bear Sterns in 2008 was "silly, do not be silly" exactly 8 days before their collapse and a -92% stock drop. In COVID, calling everyone paranoid and sensationalist about some mythical new flu was popular in December 2019 and gone by March 2020. How about Uber, the seeming go-to for how VCs can turn a money hole into a profitable business? The average price increase is now 18% per year and still going up, with an over 60% increase in 5 years. Does anyone still talk about the "sad dark HN loser path" of those who doubted VR in 2018? How's your VR startup doing?

drivebyhooting 31 days ago

The world is my oyster.

Meanwhile there are layoffs everywhere, childcare costs keep rising, products shrinkflate.

aleqs 31 days ago

Very much agree - efficiency improvements are very real both on model and hardware side. The reliance on proprietary OpenAI/anthropic APIs is a problem though, one that will naturally resolve itself in the favour of self-hosted/open models.

eikenberry 31 days ago

Wasn't GPT-5.5 much more expensive to train? Isn't training new models where most of the cost lies and that isn't going down nearly as quickly as inference is. I'm not arguing with your overall point that these tools are going to stick around, but your assumption that tokens will get significantly cheaper seems to rely on them not training anymore.

theonemind 31 days ago

I can't tell whether you're defending AI or blind optimism. I don't agree with either.

mmcnl 31 days ago

I don't think so. AI use is still very limited. For OpenAI and Anthropic and the AI boom to match their valuation, AI adoption needs to increase substantially. The current constraint is data centers. Pricing will be heavily influenced by market dynamics. Plenty of things that should be cheap aren't because of scarcity (simple example: RAM).

drzaiusx11 31 days ago

To be frank, we live in sad dark HN loser times.

runtime_terror 31 days ago

Lot of "trust me bro" vibes with this post

094459 31 days ago

moores law ftw

> Tokens will get cheaper

> it costs OpenAI less money to serve GPT-5.5 than GPT-4

> Ppl don't understand how much efficiency gains are being made

I guess "ppl" also don't understand then, with all the supposed "efficiency gains" and "tokens getting cheaper" how come MS GH Copilot is switching everyone to token-based billing? Must be because those tokens are so damn cheap, innit?

sponnath 31 days ago

I feel like they're also ignoring the increase in actual real world use costs due to reasoning. Just looking at token costs doesn't capture the whole picture.

ZephyrBlu 31 days ago

The fact you are trying to use Copilot as an example here shows you don't understand how Copilot's previous billing worked.

Previously they used "premium requests" which would allow you to make a request to one of the more expensive models. People abused the shit out of this because a request was disconnected from tokens.

You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.

Tokens for a given intelligence level are becoming much cheaper very quickly, but everyone wants to use the smartest frontier models so tokens are not dirt cheap. Even frontier models are a bit cheaper in absolute terms than they previously were, and much cheaper in terms of intelligence.

> shows you don't understand how Copilot's previous billing worked

Having used it for > 4 years and having paid for it for > 2.5 years, I think I know full well how it's previous billing worked.

> You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.

Gee, thanks Mr. Obvious! It never occurred to me this was the reason Microsoft recently removed Opus 4.6 and added a 15x multiplier in front of the inferior, but less token-intensive Opus 4.7!

ZephyrBlu 31 days ago

Why would you extrapolate from Microsoft's very poor setup to tokens in general then if you know it's stupid and not representative?

? How TF is it not representative, if it provides interface to literally ALL the major models?? What are you talking about mate?

ZephyrBlu 30 days ago

No other provider works like Copilot did with "premium requests". Usage limits (Codex/Claude Code), which are inherently linked to tokens, are the most common. Some providers like Amp charge you per-token like Copilot is moving to.

Microsoft's previous model was not linked to tokens at all. Complete anomaly among coding agent providers. It's not representative of token economics at large. Claude Code recently announced increased limits. Codex does regular limit refreshes.

Tokens are pretty damn abundant even though they're not bargain basement cheap yet.

new_account_100 31 days ago

Where do you work?