Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.
The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?
No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.
> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?
I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.
GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.
That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.
Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.
That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.
I have it as failover to Opus 4.6 in a Claude proxy internally. People don't notice a thing when it triggers, maybe a failed tool call here and there (harness remains CC not OC) or a context window that has gone over 200k tokens or an image attachment that GLM does not handle, otherwise hunky-dory all the way. I would also use it as permanent replacement for haiku at this proxy to lower Claude costs but have not tried it yet. Opus 4.7 has shaken our setup badly and we might look into moving to Codex 100% (GLM could remain useful there too).
That's a lame attitude. There are local models that are last year's SOTA, but that's not good enough because this year's SOTA is even better yet still...
I've said it before and I'll say it again, local models are "there" in terms of true productive usage for complex coding tasks. Like, for real, there.
The issue right now is that buying the compute to run the top end local models is absurdly unaffordable. Both in general but also because you're outbidding LLM companies for limited hardware resources.
You have a $10K budget, you can legit run last year's SOTA agentic models locally and do hard things well. But most people don't or won't, nor does it make cost effective sense Vs. currently subsidized API costs.
I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.
Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).
So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.
I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.
> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.
Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?
Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.
> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.
There were always jobs that required those "many more skills" but didn't require any programming skills.
We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.
Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.
I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.
First, making sure to offer an upvote here. I happen to be VERY enthusiastic about local models, but I've found them to be incredibly hard to host, incredibly hard to harness, and, despite everything, remarkably powerful if you are willing to suffer really poor token/second performance...
The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?
No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.