| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lumost 66 days ago

For equal capability tokens, there has been about a 10x drop in cost every 6 months.

We are still chasing the best because the best is moving rapidly, but it’s a simple thought experiment to work out what the cost to serve an 8B model from 2 years ago is in a world of 2T models.

Note: parameter counts are illustrative. Concretely, qwen3.6 27B delivers opus 4.5 capability at 1/27th the cost on openrouter. Single chip llama3 8b performance can exceed 17k tokens/sec.

2 comments

byzantinegene 66 days ago

8B models would be consider obsolete in the world of 2T models, at least if we're talking about the competitiveness of OpenAI/Anthropic. The only reason why they are valued so highly is their supposed dominance at the top end.

link

lumost 65 days ago

The main story of agent use cases is in enterprise so far. An enterprise will only pay for a model capable of handling the task and no more. Most enterprise's see no need to hire PhDs as factory line workers.

Coding is an interesting case as [1] the pace of progress has been absurd and [2] it's hard to put an upper bound on required capability. However hard to put a bound on and will are different, it's quite possible that the average engineer will cease to see the benefit of rapid progress - or that their employer will be satisfied with lower tier models.

How smart of a model do you need to build a high quality CRUD app for internal users? Or build a scalable web service?

link

byzantinegene 65 days ago

yes, which is why the revenue growth story is not looking so great for Anthropic/OpenAI, when open-source alternatives are not far behind with much lower costs.

link

joshuahedlund 65 days ago

> For equal capability tokens, there has been about a 10x drop in cost every 6 months

Is this still happening? Opus 4.5 was six months ago, can you get its capabilities for 1/10 cost now? Are we on track to get the same for 4.6 in a couple months?

link

lumost 65 days ago

Pretty much, Kimi K2.6 is opus 4.6 quality for coding. If you include discounts due to more efficient input caching it is around 1/10th of opus4.6.

https://openrouter.ai/moonshotai/kimi-k2.6

The march of cost efficiency moves on.

link

joshuahedlund 64 days ago

Why haven’t I heard of this? Is it available in IDEs like Cursor?

link