Hacker News new | ask | show | jobs
by realmofthemad 6 days ago
Am I missing out? I feel like I can definitely tell the difference in quality between Claude Opus and other smaller models. The smaller models are much more likely to make mistakes or to get stuck on random stuff

Maybe I just haven't been trying the right models?

2 comments

It's not just you. I tried an Opencode Go subscription, and experimented with most of their models (GLM, Kimi, Qwen, Deepseek), and none of them got anywhere close to Opus - the difference in quality was very noticeable, especially with Deepseek V4 Pro and Flash.

The only caveats: I didn't play around with Qwen 3.7 Max very much, and of course these models are far cheaper than Opus.

But any suggestion that Deepseek approaches Opus in terms of quality/intelligence immediately makes me suspect propaganda - it's that noticeable of a difference.

> But any suggestion that Deepseek approaches Opus in terms of quality/intelligence immediately makes me suspect propaganda - it's that noticeable of a difference.

The argument was never that DeepSeek is on a level with Opus - the argument is that DeepSeek is good enough for the majority of day-to-day engineering tasks (where Opus is decidedly overkill).

Absolutely. The cost comparison is roughly between DeepSeek and Haiku (assuming a reputable Western provider, not DeepSeek's own API) whereas the average capabilities sit comfortably above Sonnet.
Yes, but no. Honestly, except for frontend/IAC where I still use frontier models, I will use smaller models whenever I can.

Because even the latest opus on High don't really get what is needed, and need careful steering and a rewriting in most cases, and the code is often hard to review.

I'd rather just launch a smaller model in plan mode, argue with it and make it implement the bases I will write the code into. writing code is often faster once you know what you want, and AI most useful ability is to be a canary that also propose stuff. And I find my method faster than generating everything then reading the code to find mistakes or understand why it used X instead of Y.

I don't really read generated frontend code anyway (nor do anybody in my team care) , so I generate it and push it if it does the stuff I want it to do. For IAC it's mostly boilerplate except for 1-2 lines most of the time, and at worst a dozen, if you know where to look (and check the AI doesn't suffer from NIH), it's really easy to review generated code.