|
|
|
|
|
by mosselman
24 days ago
|
|
what a nonsense, generated, article. > For context: GLM 5.1 ran the same task and reached 7.3x. Kimi K2.6 reached 5x. DeepSeek V4 Pro reached 3.3x. The models that stopped early did so because they issued no tool calls for five consecutive rounds, they concluded they couldn’t make further progress and stopped. Qwen3.7-Max didn’t stop. By this reasoning I could release a model that lacks all the basic optimisations. Have it optimise itself for hours to reach 20x the throughput and then claim that the model is superior to the others? I am not saying that is what happened here, but the reporting is abysmal. |
|
Right now now I switched to the latest codewhale agent (in Rust), and it would perform much better according to his qualifications. Much better async IO implementation and orchestration, no more deadlocks as in the typical typescript tooling. It just doesnt stop out the blue, as claude, kimi or opencode.