Hacker News new | ask | show | jobs
by CamperBob2 3 days ago
My impression is that with the latest round of high-profile releases, the open-weight "market" is coalescing around two players, DS4 Flash for speed and GLM 5.2 for smarts. Qwen is being left behind to pick up the scraps for the terminally GPU-poor.

We know they have what it takes to fight back, and they know it... so I agree, there's no reason not be optimistic about future Qwen releases. But then I've never really understood what motivates these releases in the first place.

2 comments

DeepSeek V4 Pro seems to have significantly lower overhead than GLM 5.2 for the same context size. If the two are about equally smart, that's not a very good look for GLM. E.g. the KV-cache storage for GLM at full context is significantly larger, which directly impacts the effectiveness of batching on memory-constrained hardware. Keep in mind that the existing DeepSeek Pro is a preview model, we might be about to see further iterations of it being released. Hopefully the GLM folks will pick up these techniques for GLM 6 or something, the model itself is quite nice after all. It's just noticeably harder to run on limited local platforms.
If the two are about equally smart, that's not a very good look for GLM.

They aren't, though. GLM 5.2 is very far out in front of everybody else in the open-weight business when it comes to coding. They seem to have put a disproportionate effort into improving coding, and while it paid off for that, it does seems to have cost some efficiency.

You could say that GLM 5.2 is to DS4 as Fable is to Opus. Fable is is no better at a lot of tasks than Opus, but it codes like nothing else ever built.

Qwen still have the best models that actually run on a laptop - Gemma 4 is their best competition there.
That's only really true if one ignores the possibility of SSD offloading, which effectively opens up inference with far larger models. It's possible that the combination of batched inference and SSD streaming may be even more effective, though only for selected models with especially efficient KV storage, or perhaps very small inference contexts.