Hacker News new | ask | show | jobs
by swiftcoder 6 days ago
> and will judge, like any sane person, that US frontier models have stopped earning their multiplier

I think that this is on the money, although I'd place the bar even lower - DeepSeek v4 Flash is sufficient for basically all day-to-day coding tasks.

You might want something beefier for a complicated reverse-engineering project, but it will competently one-shot a decently complicated app or API - and a $10/month OpenCode Go subscription is sufficient to keep you in tokens for such a cost-efficient model...

Similarly, my employer hands us all Cursor, I've yet to actually switch it out of "auto" mode, which mostly runs Composer (their in-house finetune of Kimi 2.5).

3 comments

I think the situation is even more severely ridiculous than that. Google is still good enough just like it was well over a decade ago.

Most people don't have workloads that demand agentic workflows to begin with, and if their employer is pushing for that it's probably a startup that underpays or a coding sweatshop full of nepotism that fires fast.

"People shouldn't need to use AI unless they're overworked, Google is good enough."

But why should I work harder than necessary to do the same job? Why wouldn't I want to use the best tools available?

You're likely not doing the "same job". If you really want to take that stance, you can take it so easy that you're unemployed and your employer can vibecode their business into the ground.

The best tool available is your own mind. You should at least understand what you're being held accountable for.

We already went through all this shit the previous decade with copypasta and the decade before that with WYSIWYG and the decade before that...

Am I missing out? I feel like I can definitely tell the difference in quality between Claude Opus and other smaller models. The smaller models are much more likely to make mistakes or to get stuck on random stuff

Maybe I just haven't been trying the right models?

It's not just you. I tried an Opencode Go subscription, and experimented with most of their models (GLM, Kimi, Qwen, Deepseek), and none of them got anywhere close to Opus - the difference in quality was very noticeable, especially with Deepseek V4 Pro and Flash.

The only caveats: I didn't play around with Qwen 3.7 Max very much, and of course these models are far cheaper than Opus.

But any suggestion that Deepseek approaches Opus in terms of quality/intelligence immediately makes me suspect propaganda - it's that noticeable of a difference.

> But any suggestion that Deepseek approaches Opus in terms of quality/intelligence immediately makes me suspect propaganda - it's that noticeable of a difference.

The argument was never that DeepSeek is on a level with Opus - the argument is that DeepSeek is good enough for the majority of day-to-day engineering tasks (where Opus is decidedly overkill).

Absolutely. The cost comparison is roughly between DeepSeek and Haiku (assuming a reputable Western provider, not DeepSeek's own API) whereas the average capabilities sit comfortably above Sonnet.
Yes, but no. Honestly, except for frontend/IAC where I still use frontier models, I will use smaller models whenever I can.

Because even the latest opus on High don't really get what is needed, and need careful steering and a rewriting in most cases, and the code is often hard to review.

I'd rather just launch a smaller model in plan mode, argue with it and make it implement the bases I will write the code into. writing code is often faster once you know what you want, and AI most useful ability is to be a canary that also propose stuff. And I find my method faster than generating everything then reading the code to find mistakes or understand why it used X instead of Y.

I don't really read generated frontend code anyway (nor do anybody in my team care) , so I generate it and push it if it does the stuff I want it to do. For IAC it's mostly boilerplate except for 1-2 lines most of the time, and at worst a dozen, if you know where to look (and check the AI doesn't suffer from NIH), it's really easy to review generated code.

I'll root for DeepSeek v4 Flash as well. It surprised me just how "good enough" it is for most of my needs, and also dirt cheap. Everyone should try it at least once.
fired it up via the $5 opencode go subscription and am stoked. This is an amazing amount of capability for pennies on the dollar. I'm using it alongside my codex and claude max subs. Just fantastic for coding that claude is architecting.
> This is an amazing amount of capability for pennies on the dollar.

True. I doubt how long OpenCode can subsidize $10/mo Go plan for. Its weekly and monthly limits already seem restrictive for some of the most capable models like Qwen 3.7 Max and GLM 5.1. That said, if the tasks are do-able by DeepSeek v4 Pro, Mimo 2.5 Pro, and Qwen 3.7 Plus, then it indeed is a super nice deal. I haven't too many complains other than the fact that these models sometimes require more/detailed instructions than Claude Sonnet / GPT 5.x did.

+1, it's good enough for what I need to do as a DevOps engineer.