Hacker News new | ask | show | jobs
by thefourthchime 81 days ago
I won’t use anything less than the SOTA. It tried using Opus 4.6 medium and immediately regretted it. High messes up enough.
2 comments

What were you using 6 months ago?
Opus 4.5 ~= Opus 4.6 high. Opus 4.5 was nerfed just before or after the release of 4.6.
The models don’t change.
On paper. There's huge financial incentive to quantize the crap out of a good model to save cash after you've hooked in subscriptions.
And there’s an incentive to publish evidence of this to discourage it, do you have any?
Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.
There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.

[1]: https://marginlab.ai/trackers/claude-code/

Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...
Or just change the reasoning levels.
They do. I'm currently seeing a degradation on Opus 4.6 on tasks it could do without trouble a few months back. Obvious I'm a sample of n=1, but I'm also convinced a new model is around the corner and they preemptively nerf their current model so people notice the "improvement".
Make that 2, I told my friends yesterday "Opus got dumb, new model must be coming".
I swear that difference sessions will route to different quants. Sometimes it's good, sometimes not.
Real world usage suggests otherwise. It's been a known trend for a while. Anthropic even confirmed as such ~6 months ago but said it was a "bug" - one that somehow just keeps happening 4-6 months after a model is released.
Real world usage is unlikely to give you the large sample sizes needed to reliably detect the differences between models. Standard error scales as the inverse square root of sample size, so even a difference as large as 10 percentage points would require hundreds of samples.

https://marginlab.ai/trackers/claude-code/ tries to track Claude Opus performance on SWE-Bench-Pro, but since they only sample 50 tasks per day, the confidence intervals are very wide. (This was submitted 2 months ago https://news.ycombinator.com/item?id=46810282 when they "detected" a statistically significant deviation, but that was because they used the first day's measurement as the baseline, so at some point they had enough samples to notice that this was significantly different from the long-term average. It seems like they have fixed this error by now.)

It's hard to trust public, high profile benchmarks because any change to a specific model (Opus 4.5 in this case) can be rejected if they have regressions on SWE-Bench-Pro, so everything that gets to be released would perform well in this benchmark
Well, I don't see 4.5 on there ... so I'm not sure what you're trying to say.

And today is a 53% pass rate vs. a baseline 56% pass rate. That's a huge difference. If we recall what Anthropic originally promised a "max 5" user https://github.com/anthropics/claude-code/issues/16157#issue... -- which they've since removed from their site...

50-200 prompts. That's an extra 1-6 "wrong solutions" per 5 hours ... and you have to get a lot of wrong answers to arrive at a wrong solution.

Only nominally...
Oh yes, they do.
I think the conspiracy theories are silly, but equally I think pretending these black boxes are completely stable once they're released is incorrect as well.
No conspiracy theories. Companies being scumbags, cutting corners, and doctoring benchmarks while denying it. Happens since forever.
You cannot afford the SOTA.
Why is that? The $200 per month subscription comes with a ton of usage.

Opus 4.6 is available on the $20 plan too

> The $200 per month subscription comes with a ton of usage.

$200 dollars + VAT is half of my rent.

I know HN is not a good place to rant on this subject, but I'm often flabbergasted about the number of people here that lives in a bubble with regard to the price of tech. Or just prices in general.

I remember someone who said a few years ago (I'm paraphrasing): "You could just use one of the empty room in your house!". It was so outlandish I believed it was a joke at first.

EDIT: "not", minor grammar

The other part of the bubble is assuming working in projects that allow disclosing any code or project details to a generic third party with that kind of power asymmetry.
Thanks for the alternative perspective.

I think I am in the middle. I can afford $200/m but it'd be a brainer. And I don't pay that as I barely use home AI enough to warrant it.

I am also amazed at the richer end of HN but now I realize I am priviledged. Earned it? Like fuck I did. Lucky to be born a geek in late 20c. I'd be useless as a middle ages guy.

If I found myself in the middle ages I’d just become a blacksmith or a miller.
Do you have the genetics for that? It takes a lot of raw strength, and not that much intelligence.
That's why ai is for the "rich". Poor people or later on middle class will be left behind....
Nah, that's why you cannot not afford the subscriptions these days. Whatever your needs, ever since Claude Code became a thing, subscription costs come out massively cheaper than pay-as-you-go per-token API pricing. Also SOTA models are so much better than anything else, that using older or open models will just cost you more in tokens/electricity than going for SOTA subscription.

Subscriptions are definitely middle-class targeted. $20/month is not much for the value provided, at least not in the western world.

But if by "rich" you just mean "westerners", then in this sense, the same is and has always been true for computing in general.

The subscriptions are purposely sold for less than cost. The subsidy will end some day.
Not sure. AI is sort of car ownership price. I think while that ain't poor, that is middle class.

So like if you want to start a business of any sort the AI sub is still peanuts.

AI is a car, or a dog, or a mild social life, or a utility bill level of cost. And thats for the level needed for a sane typical developer. (AI maximalists need 250k/y, let them slop it out)

It is not a Cessna, an infinity pool or a 1 month vacation.

It’s a good reminder. Claude Max costs about as much as the global poverty line ($3/day.) I think it’s okay to invest in it, but we should try to make sure it’s worthwhile, and also invest in charity.
$200/mo is a lot, sure, but the shocking part of that comparison is your rent. I didn’t know $400/mo apartments still existed. For most people in the US and EU, $200 would be closer to 15%-20% of rent I think? My cell phone bill for my family is almost $200/mo.

Last year, at first, $200 seemed crazy. Now that I’m getting addicted to coding agents, not so much. Some companies are paying API rates for AI for employees, and it’s a lot more than $200/mo. It seems like funny money, and I’m not sure it’ll last.

As you've probably guessed, I don't live in the US, so the price are drastically different. I live in the EU. And for my case, I love in really small flat for some years, so the rent couldn't go up a lot.

> most people in the US and EU, $200 would be closer to 15%-20% of rent I think?

> the average rent is north of $1000/mo.

I really don't know where you get your number from, $1000/mo average is really wild to me. With this amount, you can rent a flat for a whole family in the heart of the city. Nobody of my more well-of friends have a rent this high.

Or maybe you have some capital city in mind like Paris or London?

A friend’s 2BR in Palo Alto is $6K/mo. It’s a cute little mid-century house with a small backyard, but no AC or garage.

The salaries are good in SCV, but the local economy is calibrated to absorb the money in proportion.

> I really don’t know where you get your number from

I googled it. According to Google, London’s average rent is around €2,700, around 3x higher than the average. I assume the number of people living there and paying that much balances against the number of people like you living in smaller towns and rural areas who are paying lower rents.

But yes, rents have become very high everywhere. I live in a medium sized city in the US not anywhere near a coast, and most kids attending the local university are paying over $1000/mo for a 1-bedroom place. The primary way to get cheaper rent is to have flat-mates, try to get 3 or 4 people into a place that rents for, say, $2500/mo.

I was paying $2k/mo in San Francisco 25 years ago for a place that was maybe 90m^2, and since then rents have gone way up. Google says the average now is just under $4k/mo. In some nicer neighborhoods, some people pay $8k/mo for a single bedroom. This big-city rent in SF, LA, NY, Chicago, Miami, etc. balances against the small towns in the US where you can find a room for $500/mo, which is why the average is above $1k.

It is my belief that rent price scales with the leftover income people have after they've paid for other necessities. Ie if you're from a poorer country/area then things like milk and gasoline will cost a similar amount (maybe 2x difference), but rent will cost a lot less. As people in a country get richer they start paying a larger and larger share of their income as rent of various forms.

Even the US has places with cheap rent/housing. The downside is that there's no (well-paying) work nearby.

It’s true that average rent prices are regional and poorer areas have lower rents, but that doesn’t tend to make much difference in urban areas and large cities where the majority of people live now. Why do you feel that rent scales with disposable income? Economists generally say the opposite based on housing being a core necesessity; that people pay rent in proportion to their income, and only what’s left over the the disposable amount. That’s why we have the 30% rule, for example.

You’re technically correct, btw, rental housing is a market and is subject to market forces, meaning what people are willing to pay. I’m just not so sure about framing rent as being lower priority than other necessities. And rent prices have been increasing faster than other necessities, and faster than income, so that might be a confounding factor in your argument.

Still, my initial reaction above is due to the fact that in the US and in Europe in most large cities, the average rent is north of $1000/mo.

In the US/Western Europe? Because for devs especially in the former, $200 is pocket change, especially for a core productivity tool. And the rent would be in the $1200 to $3000 easily. Same for houses. Maybe not in NY or SF, but in most of the US there's no shortage of house spaces and redundant rooms.
I've seen those comments about $200/month and empty rooms here, so I suppose they mainly come from the US, yes.

So yes, you describe a situation that I feel like a lot of people here don't understand is not the norm.

I compared the subscription with my rent precisely because it's easier to compare: with your numbers it would be like paying from $600 up to $1500 / month. Pretty hard to justify.

> Because for devs especially

Are you not a dev? If not, what would you use a coding tool for? They still require handholding for anything largeish. Still much cheaper than outsource.

You think I don't understand that? I'm friends with people who make little more than that amount per month.

But it's not all that relevant to this conversation. It's not like this is the first time economic inequality is a thing.

It's just as relevant to me factoring in your salary the next time I go buy a car.

First, I've assumed you were in the bubble I described, but that's not the case, so sorry bout that.

Also, I think it's relevant to the conversation.

You replied to someone who said that "you" (undirected pronoun I suppose) can't afford the SOTA that the $200/month Anthropic subscription comes with a ton of usage. So I interpreted it as a general statement. It wasn't what you meant?

I'm a bit lost about who you're talking to/about in your first comment: the person you respond to, a general statement for everyone reading, or yourself?

I assume when somebody says you and is not talking about anyone in particular they mean that it's infeasible for virtually everybody which is certainly not the case. Also you conveniently disregarded the fact that is available on the $20 per month plan.
For me I pass the token costs off to my clients. Not everyone is a hobbyist burning their own cash on personal projects
Work pays.
I'm not sure I've correctly understood what you're implying.

If it's that I'm not working, well, I'm employed.

It it's that I'm not working enough to not have this money... Well, we still go back to the bubble. Not everywhere in the world you can easily find a job that pays you enough, even if you accept to work more. And the employer will not accept to give developers a $200/month subscription, even less for personal use.

If it's that I'm not working enough and I should go freelancing to work as much as I want and get rich (I'm extrapolating). Well, you're right, I could do that. But (at least at first), I would work a lot more for much less money. And even if I become a recognized freelancer, it doesn't change the fact that I'll earn less money compared to the baseline of SF, or even the USA in the tech sector in general. So, bubble again. I could also, like someone said, put the tokens cost into my hourly/daily rate, but I'll be much more expensive than other freelancers.

Also, but that's a "me case" compared to my previous points, health issues can greatly affect how much work you can do.

> I could also, like someone said, put the tokens cost into my hourly/daily rate, but I'll be much more expensive than other freelancers.

Do you have any evidence of that? I think the OPs are assuming this as a premise so their logic is probably valid but may not be sound logic for you.

I guess what was meant is that those tools are generally bought by the employer
Calm down. I meant that my work covers my pro subscription.
>I'm often flabbergasted about the number of people here that lives in a bubble with regard to the price of tech

Sorry, no. You live in the bubble, the people you think are living in a bubble are actually doing the very opposite and taking advantage of the lack of bubbles in our globally connected world.

Today, basically anyone can sell any bullshit to billions of people around the world. We’ve never lived in less of a bubble.

I guess all those people who live in not-SF just can't be bothered to succeed!
$20/month is not above middle class in most of the world.

$200/month is, but you don't need that for anything except beyond-casual use of coding agents.

I’ve never been to SF, wouldn’t know anything about it.
To be fair if you think only people in SF can afford that you do kind of live in a bubble.
I'm starting to think in these conversations we're all often talking about two different things. You're talking about running an LLM service through its provided tooling (codex, Claude, cursor), others seem to be talking token costs because they're integrating LLMs into software or are using harness systems like opencode, pi, or openclaw and balancing tasks across models.
Fair enough, I read it quickly and assumed the person they replied to was talking about Claude Code

But I run a AI SaaS and we do offer Opus 4.6, too. Our use case is not nearly as token intensive as something like coding so we are still able to offer it with a good profit margin.

Also you can run OpenClaw with your CC subscription. It's what I do.

I wrap Opus 4.5 in a consumer product with 0 economic utility and people pay for it, I'm sure plenty of end users are willing to pay for it in their software.

Edit: I'm not using the term of art, I mean it literally cannot make them money.

> [...] in a consumer product with 0 economic utility and people pay for it, [...]

Sorry, how do these two things go together?

If people pay for it, it has economic utility, doesn't it? I mean, people pay to watch movies or play video games, too.

A subscription for coding - no thanks.
If you think it's only for coding you don't have much of an imagination :)
These are the types of individuals that become so left in the dust that they don't realize what's going on anymore, and it's obvious this person is already there. Claude hasn't been a "subscription for coding" product for quite some time now. That's how it started out and while that's certainly what Claude is known for, Anthropic has been pushing for Claude to also be a general productivity tool -- Claude Code, then Claude Desktop, Claude Work, and now Claude Desktop has Chat, Work, and Code essentially built into a single desktop app that just works wonders for those who are looking for a general productivity tool.

I'd not use it over pure Claude Code because I am at heart a coder and I want the raw terminal experience and there's some features missing from the "Code" tab in Claude Desktop, but just saying "a subscription to code", just goes to show how out of touch that person already is, and that's what resistance does to you when you try to resist making use of any kind of modern tooling or technology.

I should have been more precise - I don't want to depend on paying to a third party to be able to do meaningful coding.

They cane take that away from you at any time for any reason, make it too expensive, etc.

A working PC with a Linux distro has been enough and should be enough. Everything else is a time bomb.

I dunno how you guys even go throuh the $200 subscription. I use it every day for work and side projects doing tasks in parallel and Im no where newr the limit on $100.
> The $200 per month subscription comes with a ton of usage.

200 USD/month is a number only really affluent programmers (e.g. in the Silicon Valley) can perhaps pay easily.

The $100 already gives plenty of usage and is more than worth it, and I'm definitely not an affluent SV developer. I've only ever hit the 5h limit once in the last month, although I rarely run more than 3 agents at once, and I don't use ridiculously expensive tools like Gas Town.
> 200 USD/month is a number only really affluent programmers (e.g. in the Silicon Valley) can perhaps pay easily.

Not true, I live in USA PNW and my last remote job paid $12k/mo. I have been jobless for over a month now (currently waiting for the next HN "who wants to be hired"), but I still have enough savings to easily afford to continue that plan for a while.

I don't think it really has to do with affluence but more the job market and economy you're in. Countries with lower salaries or higher costs of living will have less buying power.

"Opus 4.6 is available on the $20 plan too"
Anthropic’s $20 plan gives you such a pittance of tokens that it’s borderline unusable for anything more than a few scripts or a toy app. If $20 is all you have you’d do _much_ better going with chatgpt
The Codex plan for the $20 ChatGPT plan goes much further than Claude's $20 plan, but it's still not enough if you plan to work full-time with it.
My usage is in the $60 tier, but that doesn't exist so I have to cough up $100. And then get all shaky if I don't use up my weekly quota.
That's simply not true at all.
Are you kidding me? Even developer salaries in the Philippines can afford that or at least the plan below it. If I used the Anthropic API, my monthly spend would be $4k a month. The Claude Max plan is the best bargain around.