| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by KronisLV 3 hours ago

> To give an example, just doing Typescript type fixes with this model across 50 files cost me $54 this afternoon.

If you can use a subscription with any of the SOTA models, do that.

Instead of around 4k EUR in token costs, my Opus usage costs me 108 EUR (with taxes) per month with their Max 5x plan. It's the same with OpenAI, those are heavily subsidized.

It doesn't make sense to pay per-token, unless you must.

> What is happening here is that leading AI labs are charging not only for inference but also for research in model architecture, training data collection and curation, model training cost (which can be tens or even hundreds of millions of dollars), paying their employees and recovering the marketing costs.

Chances are, they're never getting that money back. Best case scenario, the hype around AI slowly declines, worst case - it crashes and takes a part of the economy with it.

Also anyone doing distillation with hundreds or thousands of those subsidized attacks is probably winning big. Especially as the model architectures (e.g. DeepSeek V4) are more oriented towards efficiency.

> Last but not least and in fact the most important factor, is the ability of users to run local models. So far, almost everyone is using cloud-hosted models and local models are either too big to deploy or too slow to work with. With advancements in chips, this will change in 4-5 years’ time.

Currently beefy hardware to run them fast enough to be competitive with the cloud (at least 60 tps) is expensive and even then the small local models quite suck compared to SOTA or even DeepSeek V4 Pro and GLM 5.2, though they're way better than they used to be (compare Qwen 3.6 with 2.5 for example).

5 comments

Aldipower 2 hours ago

> If you can use a subscription with any of the SOTA models, do that.

Those subscriptions plans are for private use only! If you are running a business you are not allowed to use them actually. Anyway..

user43928 2 hours ago

Opus 4.8 High effort seems adequate for me currently, at API pricing, with a $200/month budget.

This is at work where I don't work on greenfield or parallelize feature development.

I cannot see the agent burning through $50 for one moderately sized TypeScript cleanup in my setup. This sounds like something that can be improved on OP's side.

There have been rumors about a potential Sonnet 5 model release in the near future, which hopefully tilts the cost/benefit ratio further in our favor.

KronisLV 1 hour ago

> I cannot see the agent burning through $50 for one moderately sized TypeScript cleanup in my setup.

Here's my usage, from the ccusage tool (slightly shortened for readability):

  ┌──────────┬───────────────┬────────────┬─────────────┬─────────────┬───────────────┬────────────────┬────────────────┬─────────────┐
  │ Month    │ Agent         │ Models     │       Input │      Output │  Cache Create │     Cache Read │   Total Tokens │  Cost (USD) │
  ├──────────┼───────────────┼────────────┼─────────────┼─────────────┼───────────────┼────────────────┼────────────────┼─────────────┤
  │ 2026-06  │ - Claude      │ - opus-4-8 │  13,635,792 │  32,562,574 │   177,985,265 │  5,265,814,971 │  5,489,998,602 │    $4665.09 │
  └──────────┴───────────────┴────────────┴─────────────┴─────────────┴───────────────┴────────────────┴────────────────┴─────────────┘

Now obviously that is all with the Max 5x subscription, other agents and models excluded.

So per day that'd be around 155 USD (including weekends), which doesn't seem that far off, as long as the example cleanup takes up around 1/3 of one's daily work (or needs a lot of review/test iterations, or needs to review a lot of the existing code etc.).

user43928 1 hour ago

Interestingly it seems 80% of the cost is in the cached tokens.

I do not know whether that is typical, or indicative of conversations with too many turns.

Not that I would worry about this on a subscription plan, but at work where we are billed at API rates, I try to move to new conversations as often as possible.

KronisLV 1 hour ago

For agentic development upwards of 90% is pretty normal!

For example, if you make Claude Code explore a codebase, write a plan based on it and your requirements, do a few iterations of further specifying and altering it, and afterwards let it work for let's say 2-4 hours.

Sub-agents and dynamic workflows do alter the numbers a bit, but not to a crazy degree in the long run.

xienze 2 hours ago

> I cannot see the agent burning through $50 for one moderately sized TypeScript cleanup in my setup.

I have absolutely seen stuff like this happen. Think about it, when you point Claude at a bunch of files, it has to suck them all up (tens of thousands of tokens), spend some proportional number of tokens doing stuff, and spit them back out (tens of thousands of tokens) for each pass in the "cleanup" loop. I had a similar situation occur a few months ago. Very small "add Javadoc to these dozens of classes" scenario. Sonnet rapidly rate limited my $20 plan so I switched to extra usage. A very small (IMO) number of changes later I had spent like $7 in tokens.

The main problem is you really have no idea ahead of time just how many tokens a given task is going to take. I suggest you try spending a day running your Opus 4.8 High effort on API pricing to see just how much your $200 subscription is being subsidized before you confidently state that $50 for some TS cleanup task isn't possible.

user43928 1 hour ago

I've spend a week doing just that - I said at API pricing, $200/month currently seems adequate for 2-4 weeks of usage for me at work.

$50 would be 10M input tokens, not tens of thousands.

xienze 1 hour ago

> I said at API pricing, $200/month

Well I saw $200/month and thought you were talking about a max plan, sorry. But I will say unless you're using that top end model extremely judiciously $200 for 2-4 weeks of work is similarly hard to believe (see the other poster breaking down their usage). What are you typically doing? Must be pretty hardcore stuff if you need to use the baddest available model. How many interactions per day? Care to share your token usage stats?

> $50 would be 10M input tokens, not tens of thousands.

Two things. One, input tokens are but one component, and the cheapest. Output tokens include the tens of thousands being spit out for file changes AND the thinking/crunching that you don't see. And that's the most expensive part. And remember, that's per iteration, not everything is one-shot (especially with tasks like "fix this large part of my codebase).

user43928 1 hour ago

I don't have stats for what I use at work. This week I have been working on React frontend, also with TypeScript.

It is not my experience that you need to do 'hardcore stuff' to require the use of a large model. The difference in productivity between babysitting Sonnet and trying to get the result into a good shape compared to using Opus 4.8 seems large to me.

At home, unfortunately I only have the stats from the official apps rather than granular ones, and it looks like the Claude Desktop app is buggy: it was showing 17M tokens total in the last 30 days, but even just clicking on a conversation in my side bar increased the counter to 19M. It's clearly not working.

Codex shows up to 900M tokens total/week.

ed_elliott_asc 1 hour ago

I think it will end up a game of haves and have nots - governments and companies with spare cash will go large and capitalise on the benefits of ai, everyone else will be left behind

christkv 3 hours ago

The subscriptions will probably disappear or end up only allowed to use gimped versions of the models long term.

ReptileMan 3 hours ago

Why do you think that subscriptions are subsidized and not that enterprise tokens are sold at 3000% margin? There are few enough frontier labs that cartel is possible.

KronisLV 47 minutes ago

The value of tokens isn't necessarily objective and there's a bunch of factors that go into it: infrastructure costs and deprecation, the operational expenses for that and personnel, as well as a bunch of R&D for training new SOTA models that others can just distill at least partially.

The fact that they'll milk corpos that actually have money is obvious, compared to me because I'm broke, as are many other subscription users. The large AI labs don't seem to be profitable so I bet for the regular users there's plenty of subsidizing going on to at least get people to use the tech (and maybe that'd lead to some conversions at work or API usage eventually):

> OpenAI's net loss ballooned from $5 billion in 2024 to a staggering $39 billion last year, as it continued to spend heavily on AI model development and securing compute capacity, the Financial Times reported on Tuesday, citing audited financial figures confirmed by its sources.

https://finance.yahoo.com/markets/stocks/articles/openai-fin...

Inference itself might be profitable, but is just funding the training and other stuff.

_flux 3 hours ago

I think this comes from the idea that serving these tokens without paying for training is already expensive, e.g. https://news.ycombinator.com/item?id=46613887 self-hosted solution might give you only 10-100x more affordable solution at cost.

So, given the SOTA providers with even larger models also need to continously be using considerable resources for training their next models, to fund future data centers, and make profit, the token costs are more likely reflecting the real costs, rather than the subscription costs.

LUmBULtERA 1 hour ago

Except there are plenty of inference providers worldwide (including the US) that serve open-weight models that are not subsidized, and are reasonable in cost. Or is your claim that those are all running at a loss?

_flux 1 hour ago

So they do not train models, and in addition their models are expected to be smaller than SOTA models, although we cannot know for sure by how much.

So what's the price difference, 3000x?

LUmBULtERA 1 hour ago

My comment is about your statement "serving these tokens without paying for training is already expensive"...

One thing we do know from OpenAI's leaked financial document is that they are already profitable on inference, though that data is not broken down by cost and revenue of API vs. subscription. One important factor is that subscription inference can be optimized in ways to reduce cost (e.g., usage limits, batch optimization around API-prioritized inference, etc...). I think simply we do not know the actual cost of subscription interference for SOTA models.