| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by baq 2 hours ago
	Serving barely useful GLM 5.2 costs what? $15k? Actually useful is like $50k? You’ll never recoup the cost unless you ‘locally’ means ‘inference provider is not the model provider’?

4 comments

dgellow 31 minutes ago

Yes they mean open weight models offered by various providers

link

fractorial 1 hour ago

Not "local" in the literal sense, but I set it up to serve at half quant for $23/hr and full quant for $35/hr.

You don't need to have it always on? This is a far cry from "$200/month," but I do not think it's $50k for "useful." Do you see it differently?

link

dakolli 27 minutes ago

This is probably the dumbest possible way to do it. Just buy tokens through open router and you could run it all month 24/7 at 100tps for practically nothing. There are tons of ways to pay for things without giving your personal information.

link

verdverm 1 hour ago

$15k or $50k is pretty cheap all things considered (a year ago it would have been more expensive, one person can spend that in a month or two)

I bought my spark and the models have already improved in that time (qwen3.6, speculative decoding 2x tgen, diffusion gemma 4x tgen) and I expect this to improve. Look out another 2-3 years, local is going to be very competitive.

link

polski-g 2 hours ago

You can recoup the costs quicker if you resell access to your local LLM on a reselling service.

link

baq 25 minutes ago

Cheaper to just buy T-bills when I saw the numbers last time

link