Hacker News new | ask | show | jobs
by nothinkjustai 39 days ago
Because of marketing and vibes mostly.

Heck I prefer DeepSeek to both of those.

3 comments

Wow, I'm really surprised. I tried deepseek (their best model, through the official API). Its extremely cheap, but its clearly not as good at programming as Opus 4.7. It seems nowhere near as good at making high level design choices. Deepseek also seems to get stuck in whack-a-mole fixing loops much more than opus. I stopped it at one point, and asked opus to solve the problem it was trying to solve and it saw the solution immediately.

I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?

I've given V4 Pro some curly things and I was impressed at how it figured them out. I agree high level design is not its forte. But it sat in a loop and dogmatically debugged a crazy dependency issue to come to the right answer over the course of 15 minutes which impressed me.
You tried v4?
I tried to like it, but it eventually got stuck in a near-infinite loop trying to debug an extra curly bracket in an iOS app.

That and the lack of image-read support surprised me. I'm a big fan of feeding screenshots into my llm and that killed it for me.

Yeah, v4.

I would have been much more impressed with v4 about 6 months ago. But I've been spoiled by opus 4.7. Deepseek isn't at the same level.

Idk, I don’t vibe code so even the flash model is great for generating code for myself. I tend to do the planning and design myself though.

Harness also matters, and also provider. I was using openrouter and switched to the Deepseek api and suddenly all the tool call issues I was having resolved themselves. Flash is so damn fast at doing stuff like generating boilerplate I can’t go back to the bigger slower models.

I feel you. I'd prefer to stick entirely with local open source models. I tried using Aider and Qwen last week, and while it's still impressive what it can do with just local resources and entirely for free, its error rate is too high, and it's clearly not remotely in the same league as Claude Code.
interestingly I had the same experience, and weirdly it's in part because it is clearly less intelligent. It's more of a mechanistic tool just doing what I ask (but still very smart and very competent about it) and less trying to win a nobel prize with each answer. Turns out I actually like that.