| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aliljet 65 days ago
	What open models are truly competing with both Claude Code and Opus 4.7 (xhigh) at this stage?

4 comments

xvector 64 days ago

Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.

The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.

tredre3 64 days ago

> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.

GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.

That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.

Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.

That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.

WarmWash 64 days ago

Glm 5.1 getting 5% on ARC-AGI 2 private is all anyone needs to know.

parinporecha 65 days ago

I've had a good experience with GLM-5.1. Sure it doesn't match xhigh but comes close to 4.6 at 1/3rd the cost

slopinthebag 64 days ago

1/3? Try 2/13 :P

5.1 is like $4 / 1m output, Opus 4.6 is $25. GPT 5.4 pro is $270 with large contexts :O

esafak 65 days ago

GLM 5.1 competes with Sonnet. I'm not confident about Opus, though they claim it matches that too.

ojosilva 65 days ago

I have it as failover to Opus 4.6 in a Claude proxy internally. People don't notice a thing when it triggers, maybe a failed tool call here and there (harness remains CC not OC) or a context window that has gone over 200k tokens or an image attachment that GLM does not handle, otherwise hunky-dory all the way. I would also use it as permanent replacement for haiku at this proxy to lower Claude costs but have not tried it yet. Opus 4.7 has shaken our setup badly and we might look into moving to Codex 100% (GLM could remain useful there too).

Someone1234 65 days ago

That's a lame attitude. There are local models that are last year's SOTA, but that's not good enough because this year's SOTA is even better yet still...

I've said it before and I'll say it again, local models are "there" in terms of true productive usage for complex coding tasks. Like, for real, there.

The issue right now is that buying the compute to run the top end local models is absurdly unaffordable. Both in general but also because you're outbidding LLM companies for limited hardware resources.

You have a $10K budget, you can legit run last year's SOTA agentic models locally and do hard things well. But most people don't or won't, nor does it make cost effective sense Vs. currently subsidized API costs.

gbro3n 65 days ago

I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.

Someone1234 65 days ago

Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).

So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.

gbro3n 65 days ago

I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.

dakiol 65 days ago

> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.

Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?

gbro3n 65 days ago

Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.

lelanthran 64 days ago

> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.

There were always jobs that required those "many more skills" but didn't require any programming skills.

We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.

Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.

gbro3n 64 days ago

I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.

aliljet 65 days ago

First, making sure to offer an upvote here. I happen to be VERY enthusiastic about local models, but I've found them to be incredibly hard to host, incredibly hard to harness, and, despite everything, remarkably powerful if you are willing to suffer really poor token/second performance...

wellthisisgreat 65 days ago

> that are last year's SOTA

Early last year or late last year?

opus 4.5 was quite a leap

HWR_14 65 days ago

$10k is a lot of tokens.

sscaryterry 65 days ago

At the rate its consuming now, I'd probably blow $10k in a month easy.