|
|
|
|
|
by walrus01
2 hours ago
|
|
I have already seen a number of people doing the math on what it would take for hardware to self host a Q8XL quantization of GLM5.2 shared between N numbers of people. There's additional advantages that everything you query, all of your context cache and everything it outputs stays private and can't be arbitrarily turned off by external interference. Personally I think it would be a fairly good bet that something with the 1TB of RAM needed to properly self-host GLM5.2 will still be a very usable piece of hardware in 4 to 5 years from now. There will be even larger, newer models available, sure. But there will also be better models that continue to fit in the same size. |
|
So you could see small LLM co-operatives working out, yeah.
But my thinking is that this four-to-five-year scenario just won't come to fruition, because the whole concept of needing to run these massive, massive models will slightly more likely be rendered moot by smaller models with better reasoning capacity, and possibly even in that timescale by hardware innovations.
One of the biggest problems I have with the whole "we won't be profitable until 2030" model is that 2030 is almost exactly as far into the future as the launch of ChatGPT is in the past, and in that time, models far more capable than that first ChatGPT have been made available to freely download and run on desktop hardware that existed before it launched, and the entire non-model surrounding functionality of that original ChatGPT plus many more functions is now not much more than a routine weekend coding project.
I don't know why the market would entertain the idea that no upset like that is possible in the same period of time again.