| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ukv 51 days ago

Price of the current frontier may vary, but price for a given level of capability tends to drop pretty fast.

April of last year you'd get 1431 ELO[0] from o3-2025-04-16 for $8.00 per million output tokens. April of this year you can get 1436 ELO from deepseek-v4-flash for $0.2 per million output tokens.

[0]: https://huggingface.co/spaces/lmarena-ai/arena-leaderboard

3 comments

saxenaabhi 50 days ago

Sure, but i don't think it's reasonable to hold given level of capability constant in a landscape where a give consumer of AI also has competitive pressures.

I can't use last year's SOTA model when my competitors can use the current SOTA model.

This is also baked in the eye watering valuations of model companies.

link

margalabargala 50 days ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

Lots of people can. Tools don't need to be top of the line to be useful. Snap-on may exist, but they don't put Harbor Freight out of business.

Advanced IDEs exist but complex projects were still built in vim.

The more capable the budget models get, the lower the marginal gains from using the frontier models, even if the frontier models always stay 6 months ahead.

link

onlyrealcuzzo 50 days ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

You can use open source models of equivalent or better capabilities for ~90% less cost...

If you kick and scream hard enough, you can always find a data point to make sure you're correct.

No one is saying that the Opus model last year costs 90% less now than it does this year.

That's not how it works.

There are better, more efficient models with equivalent capabilities that are 90% cheaper (see DeepSeek v4 Pro).

link

rzmmm 50 days ago

The ranking is not comparable across time like that.

link

Ukv 50 days ago

I'm using the current ELO of the models, and both are still running in the arena.

link

Denzel 49 days ago

Aren’t DeepSeek models deliberately priced lower than the cost to deliver? They’re subsidized which means the true cost is more than $0.2/Mtok.

link

Ukv 49 days ago

DeepSeek models are open-source so there are a bunch of third-party providers offering similar prices. Factoring in that DeepSeek have to train the model (whereas third parties can make a small profit over just the inference costs) I'd assume that on net they're spending investor money, but I wouldn't think that's any less true of OpenAI.

link

Denzel 49 days ago

Yes, DeepSeek is open-weight, but these third-party providers offering similar prices are subsidized with VC money as well. And you can find a range of prices for deepseek-v4-flash going up to and over $1/Mtok.

Even that $1/Mtok provided by Together AI is heavily subsidized by more than $1B in VC money.

This makes it unclear how the true cost curve is progressing. It’s not possible to confidently comment one way or another on the rate that cost is coming down when the entire industry is so heavily subsidized.

link

Ukv 49 days ago

> Even that $1/Mtok provided by Together AI is heavily subsidized

Can you link this? I'm unable to find them offering deepseek-v4-flash. I think you could even host the pro model for a bit under $1/Mtok. You can get ~1000TPS out of the box on a B300 that you can rent for ~$3/hr, so around $0.83/Mtok.

Regardless - Alibaba, DeepSeek, NovitaAI, AtlasCloud, Cloudflare, DeepInfra, SiliconFlow, GMICloud, Morph, Baidu, Parasail, DigitalOcean, AkashML, StreamLake and likely others all seem to be offering it under $0.3 per million output tokens[0].

> This makes it unclear how the true cost curve is progressing

For no actual improvement in efficiency to be presented as a 10X yearly improvement since 2018, we'd need to currently be getting 100000000X more intensive models than we should be for what we're paying (a $1/Mtok model actually costs $100000000/Mtok). Presenting, say, a 9X actual yearly improvement as a 10X yearly improvement seems feasible, but for much beyond that I think the exponential just compounds too fast to reasonably fake.

[0]: https://openrouter.ai/deepseek/deepseek-v4-flash#pricing

link