|
|
|
|
|
by aeve890
3 days ago
|
|
Please correct me if I'm wrong, I'm totally out of my field here but what's the point of sota models that can be run only by hyperscalers? I mean, glm-5.2 is open source but with 1.5TB in weights who can run it really? It still needs dozens of H100s. Those 753B quantized down to Q4 (~400Gb) would require datacenter levels of hardware. Down to Q2 still would require serious hardware, way out of reach for most users, and you'll be far from the sota benchmark of the full precision model. I get it, it's open source but not quite democratizing LLM for everyone except compute providers. It's no like, let's say, Kubernetes. I can run k8s fully in my shitty homelab, without "quantization" exactly like Google does in their datacenters. |
|