Hacker News new | ask | show | jobs
by Palmik 381 days ago
> API that is likely a loss-leader to grab market share (hosted LLM cloud models).

I don't think so, not anymore.

If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.

As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.

[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.

3 comments

You're leaving out their training costs. And while you might say "well, once they're trained you don't have to spend more on that", but as we've seen they have to keep training new models on new data, such as current events and new language features and APIs. And some aspects of that training are becoming more costly, or more scarce, as companies like Reddit and Stackoverflow restrict and sell their data, less data gets produced on Stackoverflow as people switch to using LLMs instead, website operators go to more extreme measures to block AI scrapers that ignore robots.txt, etc.

Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?

I use whisper to transcribe long conversations, and deploying the model myself on vastai is ten times cheaper than OpenAI's API offer.
I’m assuming doing transcription on a vast GPU is also ten+ times faster than local options?

https://news.ycombinator.com/item?id=44225953

I don’t completely disagree, but “assertion one” [1]

[1] ~ you can obviously verify this yourself by doing it yourself and seeing how expensive it is.

…is an enormously weak argument.

You suppose. You guess. We guess.

Let’s be honest, you can just stop at:

> I don’t think so.

Fair. I don’t either; but that’s about all we can really get at the moment afaik.

No, the point of [1] is that this is not some "secret knowledge". My response is based on running models in production and comparing my costs with the costs I would pay to API providers running the same models.
he's not wrong, if you can run a open weights model in any cloud, you can very straightforwardly estimate the cost of running the model. considering that these providers either use long-term contracts or maybe even buy their own hardware, this theoretical cloud deployment is itself an overestimate of the costs
…and its perfectly legit to run that, write the numbers down and link to it.

But:

A) it makes absolutely no difference to the fact you have no idea what the big LLM providers are actually doing.

B) Just asserting some random thing and saying “anyone competent can verify this themselves” is a weak argument. Youre saying youve done the research, but failing to provide any evidence you actual have

If youve crunched the numbers then man up and post them.

If not, then stop at “I think…”

“This is based on my experience running production workloads…” is a nice way of saying “I dont have any data to backup what Im saying”.

If you did, you could just link to it.

…by not posting data you make your argument non-falisifyable.

It is just an oppinion.