Hacker News new | ask | show | jobs
by zozbot234 39 days ago
> I don't think cloud models are going away; the hardware for good perf is expensive

I think local AI will win in its niche by repurposing users' existing hardware, especially as cloud hardware itself gets increasingly bottlenecked in all sorts of ways and the price of cloud tokens rises. You don't have to care about "bad" performance when you've got dedicated hardware that runs your workloads 24/7. Time-critical work that also requires the latest and greatest model can stay on the cloud, but a vast amount of AI work just isn't that critical.

1 comments

Users do not have an existing $80k of hardware, are not going to buy $80k of hardware for worse performance than paying $100/month, and models are continuing to grow in size while memory grows in price.
You said you need $80k in hardware for "good performance". I'm saying the local AI inference workflow will be a lot more flexible about performance than that, and can get away with something vastly cheaper and in line with what the user owns already.
> paying $100/month

There will not ever be a monthly subscription for LLM tokens. The economics isn't there.

Local tokens will always be cheaper.

What's the basis for saying local tokens will always be cheaper? As others have outlined, LLMs serving one user at a time are pretty expensive, but concurrent users become much more cost-effective (assuming there's enough RAM for the contexts). If "local" to you means ~10 hours daily use by a team of employees, the company still has to balance against cloud services that can amortize non-recurring costs over 24 hours of service per day.
Why would a team of employees not be able to run AI workloads 24/7? Not all workloads are time sensitive.
Both my experience, and Anthropic's off-peak promotion, indicate that there are very uneven levels of demand for peak hours versus off-peak hours. How close do you think they are?
But that's demand for cloud inference that's priced on a flat-rate basis with some adjustments (like "off-peak hours"). Not a local rig where inference is effectively free aside from the cost of power whenever the system isn't congested.
There already are many subscriptions for LLM tokens: OpenAI, Claude, Synthetic (shameless plug), Zai...

I'm not sure what you mean by "There will not ever be a monthly subscription for LLM tokens." That already exists!

Monthly subscriptions is a "first hit is free" promo.

In the future LLMs will be priced per token, not all-you-can-eat.

LLM subscriptions are not "all you can eat," they have rate limits — and fundamentally there is no difference between subscription-with-rate-limit and typical usage-based business practices. Subscriptions are simply usage-based pricing with volume discounts in exchange for upfront payment; every single usage-based provider of pretty much anything offers the same kind of discounting for buying volume commitments upfront. Although from a business sense, subscriptions are even better than volume discounts... Because they're recurring, whereas reserved volume might not recur.

Subscriptions aren't gonna go away. They're great for businesses. Rate limits or pricing might change but the underlying business model is very good.

The reason usage-based is so much more expensive than subscription isn't that usage-based is the "true" cost and subscription is a loss leader — just like a buying 30 consecutive day passes to a gym being more expensive than a monthly membership isn't a result of memberships being a loss leader. Memberships are the business model! The day passes are overpriced to steer you into buying the membership.