|
I gave the same prompt (a small rust project that's not easy, but not overly sophisticated) to both Gemma-4 26b and Qwen 3.5 27b via OpenCode. Qwen 3.5 ran for a bit over an hour before I killed it, Gemma 4 ran for about 20 minutes before it gave up. Lots of failed tool calls. I asked codex to write a summary about both code bases. "Dev 1" Qwen 3.5 "Dev 2" Gemma 4 Dev 1 is the stronger engineer overall. They showed better architectural judgment, stronger completeness, and better maintainability instincts. The weakness is execution rigor: they built more, but didn’t verify enough, so important parts don’t actually hold up cleanly. Dev 2 looks more like an early-stage prototyper. The strength is speed to a rough first pass, but the implementation is much less complete, less polished, and less dependable. The main weakness is lack of finish and technical rigor. If I were choosing between them as developers, I’d take Dev 1 without much hesitation. Looking at the code myself, i'd agree with codex. |
Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day.
[0]: https://github.com/ggml-org/llama.cpp/pull/21326
[1]: https://github.com/ggml-org/llama.cpp/issues/21316