Wow, I'm really surprised. I tried deepseek (their best model, through the official API). Its extremely cheap, but its clearly not as good at programming as Opus 4.7. It seems nowhere near as good at making high level design choices. Deepseek also seems to get stuck in whack-a-mole fixing loops much more than opus. I stopped it at one point, and asked opus to solve the problem it was trying to solve and it saw the solution immediately.
I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?
I've given V4 Pro some curly things and I was impressed at how it figured them out. I agree high level design is not its forte. But it sat in a loop and dogmatically debugged a crazy dependency issue to come to the right answer over the course of 15 minutes which impressed me.
Idk, I don’t vibe code so even the flash model is great for generating code for myself. I tend to do the planning and design myself though.
Harness also matters, and also provider. I was using openrouter and switched to the Deepseek api and suddenly all the tool call issues I was having resolved themselves. Flash is so damn fast at doing stuff like generating boilerplate I can’t go back to the bigger slower models.
I feel you. I'd prefer to stick entirely with local open source models. I tried using Aider and Qwen last week, and while it's still impressive what it can do with just local resources and entirely for free, its error rate is too high, and it's clearly not remotely in the same league as Claude Code.
interestingly I had the same experience, and weirdly it's in part because it is clearly less intelligent. It's more of a mechanistic tool just doing what I ask (but still very smart and very competent about it) and less trying to win a nobel prize with each answer. Turns out I actually like that.
I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?