| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kir-gadjello 152 days ago

I don't think it's strictly better than GLM 5, more like they are peers (but in math competitions StepFun is stronger than most), and in my experience have similar coding/bugfix ceiling where world knowledge is not the deciding factor. But I didn't test GLM 5 for more than 30 hours, and my agentic harness (opencode) might be suboptimal - I'm open to the idea that GLM 5 with the right agentic harness is ready for ultra-long autonomy, but I have yet to see it myself.

Where GLM 5 is strictly worse for me though, compared to StepFun, is long-form content generation (planning, research documents) - but this can be said about geminis too and these are obviously very smart models.

Given the free option I'd explore GLM 5 more, but if I had to pay for it myself ofc I'd choose stepfun every time. Basically I think right now the optimal configuration for maximizing output of correct software features per dollar involves using StepFun or its future class competitor for bulk coding and first stage code review.

Maybe I need to write a blogpost about it after all.

1 comments

Aerroon 152 days ago

I tried them both out with a task of creating a todo-like web app (you can use the chat interface for GLM 5 for free if there's capacity). GLM 5 ended up with a working version. Sadly StepFun didn't quite function right. The main issue was that it ended up putting everything that should be in different columns into a single one. I didn't prompt it further to fix it, but it seems relatively capable. I think it beat what the big Qwen model came up with.

What's really surprising to me is the cost of the model. It's definitely very good for its price. DeepSeek is the only one that offers and competition to it at that price point (GLM 5 is literally 10x more expensive).