Hacker News new | ask | show | jobs
by hansmayer 31 days ago
> Tokens will get cheaper

> it costs OpenAI less money to serve GPT-5.5 than GPT-4

> Ppl don't understand how much efficiency gains are being made

I guess "ppl" also don't understand then, with all the supposed "efficiency gains" and "tokens getting cheaper" how come MS GH Copilot is switching everyone to token-based billing? Must be because those tokens are so damn cheap, innit?

2 comments

I feel like they're also ignoring the increase in actual real world use costs due to reasoning. Just looking at token costs doesn't capture the whole picture.
The fact you are trying to use Copilot as an example here shows you don't understand how Copilot's previous billing worked.

Previously they used "premium requests" which would allow you to make a request to one of the more expensive models. People abused the shit out of this because a request was disconnected from tokens.

You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.

Tokens for a given intelligence level are becoming much cheaper very quickly, but everyone wants to use the smartest frontier models so tokens are not dirt cheap. Even frontier models are a bit cheaper in absolute terms than they previously were, and much cheaper in terms of intelligence.

> shows you don't understand how Copilot's previous billing worked

Having used it for > 4 years and having paid for it for > 2.5 years, I think I know full well how it's previous billing worked.

> You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.

Gee, thanks Mr. Obvious! It never occurred to me this was the reason Microsoft recently removed Opus 4.6 and added a 15x multiplier in front of the inferior, but less token-intensive Opus 4.7!

Why would you extrapolate from Microsoft's very poor setup to tokens in general then if you know it's stupid and not representative?
? How TF is it not representative, if it provides interface to literally ALL the major models?? What are you talking about mate?
No other provider works like Copilot did with "premium requests". Usage limits (Codex/Claude Code), which are inherently linked to tokens, are the most common. Some providers like Amp charge you per-token like Copilot is moving to.

Microsoft's previous model was not linked to tokens at all. Complete anomaly among coding agent providers. It's not representative of token economics at large. Claude Code recently announced increased limits. Codex does regular limit refreshes.

Tokens are pretty damn abundant even though they're not bargain basement cheap yet.