Hacker News new | ask | show | jobs
by aeve890 3 days ago
Please correct me if I'm wrong, I'm totally out of my field here but what's the point of sota models that can be run only by hyperscalers? I mean, glm-5.2 is open source but with 1.5TB in weights who can run it really? It still needs dozens of H100s. Those 753B quantized down to Q4 (~400Gb) would require datacenter levels of hardware. Down to Q2 still would require serious hardware, way out of reach for most users, and you'll be far from the sota benchmark of the full precision model. I get it, it's open source but not quite democratizing LLM for everyone except compute providers. It's no like, let's say, Kubernetes. I can run k8s fully in my shitty homelab, without "quantization" exactly like Google does in their datacenters.
2 comments

SOTA models can be run by anybody with compute capacity. You can pay for GLM 5.2 inference right now via Fireworks AI and presumably several dozen other providers. So if you don't want vendor lock-in and rug-pulling (Anthropic has churned on their subscription model like 4-5 times in the past month) you can just pay an inference provider and have far more control over your environment.
If you have a ton of capital, you still can't spin up Claude Opus and compete on price with Anthropic with your new fancy optimizations. With open models you can and that is great for consumers.
>If you have a ton of capital

That's my point. This "open source" doesn't feel like the real open source. It's open just for the few ones with ton of capital, and mostly in the US, or US adyacent markets. It's like if SpaceX publish an open source rocket design and people celebrating like it's the new Linux. Feels more like a goodwill gesture than something with real impact for the benefit of mankind, like the spirit of open source software as commonly understood.

The point is that you need several orders of magnitude less capital to run GLM-5.2 compared with the investment needed to train a model like Opus or GLM-5.2 from scratch. To do inference of GLM-5.2 you'd need an investment of roughly less than €300k (8x H200 at GLM5.2 FP8), which is completely feasible for a lot of hosting businesses.

Even if end-users can't run these models themselves at home, there are a lot more and varied options to choose from, especially considering privacy and data protection.

You can apparently also do GLM-5.2 at Q4_K_XL with 2x RTX 3090 and lots of RAM [1], but I don't think that counts as a potential frontier model.

[1] https://news.ycombinator.com/item?id=48639186

dont compare with training compare running glm 5.2 with paying for claude enterprise subscription right?