| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mosselman 24 days ago

what a nonsense, generated, article.

> For context: GLM 5.1 ran the same task and reached 7.3x. Kimi K2.6 reached 5x. DeepSeek V4 Pro reached 3.3x. The models that stopped early did so because they issued no tool calls for five consecutive rounds, they concluded they couldn’t make further progress and stopped. Qwen3.7-Max didn’t stop.

By this reasoning I could release a model that lacks all the basic optimisations. Have it optimise itself for hours to reach 20x the throughput and then claim that the model is superior to the others?

I am not saying that is what happened here, but the reporting is abysmal.

3 comments

rurban 24 days ago

It is not the model's job to stop or continue, it's the agent. Qwen has nothing to do with it.

Right now now I switched to the latest codewhale agent (in Rust), and it would perform much better according to his qualifications. Much better async IO implementation and orchestration, no more deadlocks as in the typical typescript tooling. It just doesnt stop out the blue, as claude, kimi or opencode.

link

big-chungus4 24 days ago

It optimized the Extend Attention operator in triton. All models were optimizing the same operator

link

hobofan 24 days ago

They didn't optimize their own kernels and optimize their own runtime, which I think is what you are implying.

link