| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by walrus01 2 hours ago

I have already seen a number of people doing the math on what it would take for hardware to self host a Q8XL quantization of GLM5.2 shared between N numbers of people.

There's additional advantages that everything you query, all of your context cache and everything it outputs stays private and can't be arbitrarily turned off by external interference.

Personally I think it would be a fairly good bet that something with the 1TB of RAM needed to properly self-host GLM5.2 will still be a very usable piece of hardware in 4 to 5 years from now. There will be even larger, newer models available, sure. But there will also be better models that continue to fit in the same size.

2 comments

dofm 1 hour ago

Back in the earlier days of the internet, when "dedicated servers" were a competitive advantage, hobbyists and small dev shops definitely shared dedicated hardware.

So you could see small LLM co-operatives working out, yeah.

But my thinking is that this four-to-five-year scenario just won't come to fruition, because the whole concept of needing to run these massive, massive models will slightly more likely be rendered moot by smaller models with better reasoning capacity, and possibly even in that timescale by hardware innovations.

One of the biggest problems I have with the whole "we won't be profitable until 2030" model is that 2030 is almost exactly as far into the future as the launch of ChatGPT is in the past, and in that time, models far more capable than that first ChatGPT have been made available to freely download and run on desktop hardware that existed before it launched, and the entire non-model surrounding functionality of that original ChatGPT plus many more functions is now not much more than a routine weekend coding project.

I don't know why the market would entertain the idea that no upset like that is possible in the same period of time again.

link

kristopolous 42 minutes ago

the biggest problem is most ai will be local by 2030. Every future device you buy will have AI compute on it somewhere, built in, like it has on-device floating point.

On top of this, people are constantly coming up with better ways of running models on less special hardware and "good enough" models are now existing for most tasks.

So where does that leave the frontier labs? Drug discovery? Maybe some hard math problems? I mean it's not that big actually...

We're in a brief window where this is profitable, like batch computing was in the 70s. However, once your own device can do it, you're going to start migrating.

link

xienze 1 hour ago

> So you could see small LLM co-operatives working out, yeah.

Only on a pay-per-token basis, I think. Unless it's a very tight-knit circle of folks. Fixed monthly subscription costs I doubt would work in that model. Because you'll get the inevitable: someone pegging the service 24/7 because it's "unlimited" while everyone else suffers.

link

dofm 1 hour ago

Well, many of us who shared hardware also ran monitoring to make sure the share was fair; there used to be a whole industry for that sort of quota stuff.

You can presumably hard-limit LLMs the same way — total, burst quotas etc.

(Suddenly getting a very fun flashback to the environment in which someone first explained Markov chains to me — MediaMOO. A text-based chat environment with configurable limits on the number of CPU "ticks" you were allowed in order to do things)

link

nok22kon 34 minutes ago

the same argument was made 2 years ago: "in 2 years we'll be able to run GPT-4 level models on an expensive laptop, most people will be using this instead of the fancy cloud models".

we are there, Gemma4/Qwen3.6 are GPT-4 level models runnable on a fancy laptop.

but expectations shifted, nobody wants a GPT-4 level model anymore

link