Hacker News new | ask | show | jobs
by twoodfin 65 days ago
From the limited perspective of software development, today’s models are well-worth their per-token cost.

This reads to me like Anthropic anticipating demand and making a commitment to acquire supply. Not unlike airlines committing to future jet fuel purchases, or Apple committing to future DRAM volume.

3 comments

> From the limited perspective of software development, today’s models are well-worth their per-token cost.

At the current price or real price? Anthropic said a $200 subscription can cost them $5000 so the real price could be anywhere from 10-30x the current price.

No, that is probably one of the worst cases they probably saw. Most likely the subscription inference cost is much lower than you expect. If you look at costs for similar open models they are much lower than what you get by buying from anthropic, so that is the real cost basis I expect.

It's likely Amazon is making a fucking killing though.

While $5000 is a lot, the people who rack up close or just over a thousand "API equivalent cost" are pretty common.

> Most likely the subscription inference cost is much lower than you expect.

This is probably not true because they'd be screaming it off every rooftop were that the case.

Same deal with the API inference. Even the "profitable on inference" claim is sourced back to hearsay of informal statements made by OpenAI/Anthropic staff. No formal announcements, nothing remotely of the "You can trust what I'm saying, because if I'm lying the SEC will have my head" sort.

Yet making such statements would be invaluable. If Anthropic can demonstrate profitability before OpenAI, they could poach most of the funding. There's no reason to keep it a company secret.

And API inference is only part of the total costs, not even bringing in training and ongoing fine-tuning. If they're not even profitable on inference, how could they hope to be profitable overall.

I don't know about SEC rules but the anthropic CEO said they have a 50%+ margin on API pricing.
I'm going to be a dickhead for a moment here, apologies, there's no way to say this that isn't rude to you. This is still the same hearsay "In an interview, somewhere."

A bit of google searching later can get us a specific interview. https://www.dwarkesh.com/p/dario-amodei-2

> Let’s say half of your compute is for training and half of your compute is for inference. The inference has some gross margin that’s more than 50%.

But the context, the very previous sentence is:

> Think about it this way. Again, these are stylized facts. These numbers are not exact. I’m just trying to make a toy model here.

Here, Amodei is in effect using weasel words. He is not giving any actionable claims about Anthropics margins, merely plucking an arbitrary number. Why 50%? Is 50% reasonable? Is 50% accurate to the company? Those are all conclusions the listener draws, not Amodei.

> I don't know about SEC rules

The main premise is that, as a CEO, there are some regulations you are beholden to. You're not allowed to announce you've made a trillion dollar profit, sell all your stock, and then go "teehee just kidding". The SEC prosecute you for securities fraud if you do that stuff.

This makes such weasel words as earlier suspicious. Because the exact statement Amodei gives is not prosecutable. He's not saying anything about the company, just doing a little "toy model".

The degree to which it is intentional that this hearsay travels and is extrapolated from "Well he picked 50% because it's a reasonable figure, and because he's CEO, a reasonable figure would have to be a figure akin to what his company can achieve" into "Anthropic has 50% margin", that's up for debate. Maybe it is intentional, maybe Amodei is exactly the same kind of shitweasel as Altman is. Probably he's just a dumbass who runs his mouth in interviews and for whatever reason cannot issue the true number in an authoritative statement to dismiss this misconception.

Hence my original comment; If the real number were better than the hearsay rumours of the number, Amodei would immediately issue a correction; It'd be great for the company. Hell, even if 50% were about the margin, that'd be great! To promote that from mere hearsay to "we're profitable, go invest all your money" would also be huge. Really, any kind of margin at all would put him ahead of OpenAI.

But he doesn't issue a correction. He doesn't affirm the statement. Perhaps he has other reasons for that, but a rather big reason could be that the margin number is in fact pretty bad.

Now, the observant reader will note I am also using a weasel word there. I do not know whether the number is good or bad, your take away should be "it could be bad." Not "it is bad". Go pressure Amodei into giving us the real number.

Interesting. So the 50%+ number that's been floating about isn't even real. It's just an example.
Self reply as I could've explained the SEC thing better:

Anti-fraud regulators like the SEC give an inherent trustworthiness and credibility to CEOs and other market participants. You can trust that they're not lying to you, because they would be sent to jail if they were.

Another example are general anti-fraud regulations; Consider how one would trust North American or European steel suppliers more than Chinese steel suppliers.

It's not that the Chinese are "evil lying people" and Americans are "saints who never lie", it's that you can trust American, Canadian, and European courts to hold the liars accountable by regulations even if you're not in any of those regions. But the Chinese liars won't be held accountable by regulations.

Thus also the opposite, if someone opts out of this credibility granted to them by anti-fraud regulations, their words may not be quite so truthful.

SEC rules means CEO cannot lie or deliberately hide the cost of something.

50%+ Margin statements have basically been "We are making 50% on delivering it." This does not include ANY of the costs of getting to this point, training, scraping, datacenters, people and so forth.

They are basically saying "Oh yea, the cost of GAS in the car is only X so charging Y per mile is great margin" while ignoring maintenance, cost of acquiring the car and so forth.

but comparing your margin of charging to drive a mile to the price of gas makes a lot of sense? that is the only variable cost in the equation. training / scraping / people are all pretty much fixed costs.
That's a tad naive. CEOs can and have and often lied about everything:

Sam Bankman-Fried, Elizabeth Holmes, Kenneth Lay - and hundreds if not thousands more.

The SEC is a regulatory agency, not able to bring criminal charges. The above-named for the most part had to be prosecuted by the Department of Justice or sometimes state attorneys.

> While $5000 is a lot, the people who rack up close or just over a thousand "API equivalent cost" are pretty common.

I think if you're not Anthropic and you don't have access to the actual data, then you can't say for sure. A bunch of anecdotes on terminally-AI people on twitter is not making a convincing case for me, IMO.

On the other hand, if similarly sized models cost much much cheaper than this, why, in the world, would Anthropic have much higher costs than that?

Also, counterpoint, maybe they want you to think that they have higher costs so you're more willing to actually pay for it?

The "worst case" is probably someone just using their $200 account limits. So yeah, real cost is probably close to that
At the full current retail API price.

Business buyers are paying API prices, not subscription

Disclosure: Work at Microsoft on AI

Are your API prices profitable?
And receiving investment from their vendor in exchange? When this is done in established companies it is typically called a kickback and directed toward one person, but in this case the whole thing is so incestuous the kickback goes straight to the top.
Is it crazy to imagine Anthropic can leverage short term cash flow now to build the models and products that will let them resell $100B in AWS infra with nice margins tomorrow?

If Amazon believes that story they’d be crazy not to invest.

Yes I understand why the agreement exists, but that does not remove the circularity.
But that per-token cost is a total joke. All these companies are fighting to build market share in some future dominated by one or two AI ecosystems. It is musical chairs until someone creates the one ring to rule them all. So they are charging token amounts just to claim revenue as they burn through investor dollars.

In short: per-token charges currently cover maybe 1% of the total costs in this field. To pay ongoing costs, and pay back investors, everyone will need to pay 100x or 1000x the current rates, likely for decades.

> In short: per-token charges currently cover maybe 1% of the total costs in this field

There are plenty of seemingly informed people saying the exact opposite, so that's a lot of confidence you're talking with. I have a hard time believing it when we know what open weights models cost to run. And sure, there's training costs, but again many say inference costs are already above training costs.

If that's true, it's very unsustainable.

Gemma-4 26B-A4B + M5 MacBook Pro + OpenCode isn't Claude Code _yet_, but it's good enough that if I were forced to use it I would be fine.

Yes, it's amazing how quickly so many tech companies have hitched their tooling to these big AI vendors seemingly without any thought towards whether they'll still exist a year or three or five from now. Insane behavior. To the (debatable!) extent that AI coding tools are useful at all wouldn't it be a hell of a lot smarter to self-host? At least that way you have some control over QoS, and a stable, predictable result... Or maybe nobody cares about that kind of thing anymore? What happened to basic business math in this industry?
The basic business math is (to start) software companies realizing that spending $10k, $20k, $50k (more ?) per year, per developer for current models at current token rates might not be particularly insane, given the value return.

Models are likely going to keep getting better, and as costs go down, demand is likely to rise faster.

> as costs go down

Huh? Why would that happen? Indications are that costs will likely go up, especially if currently vendors are selling tokens at a loss.

The main operational expense of a million LLM tokens is pennies of electricity.

Even if you generously depreciate the GPU and other hardware, it’s hard to believe inference at scale in April 2026 isn’t highly profitable.

It’s getting better on both the hardware and the software fronts the barbarians are banging at the gates.
I'm not sure this information is grounded, but I remember to have read somewhere the inference is indeed profitable. My personal experience is similar. Running 2x3090s draw 500-600W and you can locally run amazing models with a similar setup.
Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.
You are in violent agreement.

> inference is indeed profitable

From the perspective of a deal like this, “total costs in the field” matter less than incremental cost per token served.

The unit economics for today’s frontier models should be great, and this suggests Anthropic believes they’ll get better.

In a decade the cost of compute will be a tiny fraction of what it costs now. Specialized hardware will exist that will be cheap and efficient.
The difference in the cost of compute between 2026 and 2036 won’t be nearly as large as the difference in the cost of compute between 2016 and 2026. Even at 2016 the slowdown in improvements was noticeable.

We might see a one time bump in inference when we move off GPUs onto more limited and efficient dedicated hardware, but the sustained fast pace of improvements are far behind us.

I'm predicting now that there is a clear use-case for this tech that work will (and has) accelerate specialized hardware, software, models, etc that will run much more efficiently in 10 years. So that the real token costs will be a fraction of what they are now.
You can run models on FPGAs and get massive cost, speed, and throughput gains (like 10x). The reason people don’t do it is because of other improvements (algorithmic) means that nobody really thinks locking into a model makes sense…yet. Would I want to use gpt 4o for anything today at 1/10th the price? That would be $0.40 per input, $1.50 per output. Gemma-4 31b is much more capable and cheaper. So a FPGA version of the model is just not worth it today.

But if progress begins to slow down, then the economics work. Maybe Gemma 4 is a good example. It feels really generally useful. Getting it at 1/10th the cost feels like it could be competitive in 2 years.

The fpga would be for prototyping. The real progress comes from asics ... exactly as we saw with bitcoin mining. This GPU-based approach will eventually give way to bespoke circuits once everyone picks a favorite model.
Compute power improvement between 2016 and 2026 wasn't that impressive either. Moore's law is essentially dying.
Yeah I went shopping for a new computer a couple of years ago (to replace a 7 year old computer) and... the specs for what was for sale were the same as what I bought 7 years prior, and the price wasn't much lower.
I would much rather buy a 2026 computer than a 2019 computer. Two generations of Nvidia GPUs, Apple M series chips, the X3D AMD chips, and pcie5 ssds are all major upgrades.

It’s just that the pace of new stuff is slowing down, and many people are operating under the assumption that this wave will ride on forever.