Hacker News new | ask | show | jobs
by otabdeveloper4 39 days ago
> paying $100/month

There will not ever be a monthly subscription for LLM tokens. The economics isn't there.

Local tokens will always be cheaper.

2 comments

What's the basis for saying local tokens will always be cheaper? As others have outlined, LLMs serving one user at a time are pretty expensive, but concurrent users become much more cost-effective (assuming there's enough RAM for the contexts). If "local" to you means ~10 hours daily use by a team of employees, the company still has to balance against cloud services that can amortize non-recurring costs over 24 hours of service per day.
Why would a team of employees not be able to run AI workloads 24/7? Not all workloads are time sensitive.
Both my experience, and Anthropic's off-peak promotion, indicate that there are very uneven levels of demand for peak hours versus off-peak hours. How close do you think they are?
But that's demand for cloud inference that's priced on a flat-rate basis with some adjustments (like "off-peak hours"). Not a local rig where inference is effectively free aside from the cost of power whenever the system isn't congested.
The local rig is not free and requires very large capital expenditures while producing very low token throughput for large models. Within any time budget, you can get many orders of magnitude more large-model tokens off an 8xB200 than off a local rig. Therefore cloud tokens have a huge capital efficiency advantage over local rigs. That will continue basically forever, since there will always be large cloud companies willing to spend millions of dollars for more capital-efficient hardware, so Nvidia and friends will continue to spare no expense producing it, meaning the cloud hardware will be way too expensive if you're not a large inference company. You can also buy local rigs, but they will be less capital efficient per token, not more.

(This is a generous argument: it also ignores the massive software stack optimization the cloud companies do that doesn't trickle down to local-rig-sized deployments; for example, prefill/decode disaggregation, which would double the VRAM requirements for a local rig — if you could even do it on a local rig, which you can't, because local rigs don't have Infiniband. But at scale, prefill/decode disaggregation improves capital efficiency, since you can tune the compute-bound prefill node differently than the memory-bound decode node.)

The advantage of local rigs is not capital-efficient tokens. It's privacy. But then again, you can get zero-data-retention options from many inference companies, so for many use cases it may not matter unless you need strict guarantees the data never leaves the building...

> The local rig is not free and requires very large capital expenditures while producing very low token throughput for large models.

Sometimes it really is free though, because the hardware was bought to serve some other existing needs and that capital expense was fully depreciated quite some time ago. Underutilised hardware is essentially ubiquitous.

> Within any time budget, you can get many orders of magnitude more large-model tokens off an 8xB200 than off a local rig.

But using that 8xB200 setup to run inference on cheap, non-frontier models is a plain waste. Its highest and best use is in an AI datacenter serving exceptionally smart models like Gemini DeepThink, GPT Pro or Claude Mythos. (If this isn't true, it means that the current level of large-scale investment in frontier, super intelligent AI is misplaced, and you should worry about that; not whether some models are best ran on lower-end hardware!)

There already are many subscriptions for LLM tokens: OpenAI, Claude, Synthetic (shameless plug), Zai...

I'm not sure what you mean by "There will not ever be a monthly subscription for LLM tokens." That already exists!

Monthly subscriptions is a "first hit is free" promo.

In the future LLMs will be priced per token, not all-you-can-eat.

LLM subscriptions are not "all you can eat," they have rate limits — and fundamentally there is no difference between subscription-with-rate-limit and typical usage-based business practices. Subscriptions are simply usage-based pricing with volume discounts in exchange for upfront payment; every single usage-based provider of pretty much anything offers the same kind of discounting for buying volume commitments upfront. Although from a business sense, subscriptions are even better than volume discounts... Because they're recurring, whereas reserved volume might not recur.

Subscriptions aren't gonna go away. They're great for businesses. Rate limits or pricing might change but the underlying business model is very good.

The reason usage-based is so much more expensive than subscription isn't that usage-based is the "true" cost and subscription is a loss leader — just like a buying 30 consecutive day passes to a gym being more expensive than a monthly membership isn't a result of memberships being a loss leader. Memberships are the business model! The day passes are overpriced to steer you into buying the membership.