Prediction: Claude 5 will be a major regression

Y	Hacker News new \| ask \| show \| jobs

3 points by cadabrabra 132 days ago

At this point it should be completely obvious to everyone that there’s what is approximately a linear relationship between model cost and model performance. Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance. This is Anthropic’s version of GPT-5, i.e. a way to fool their customers into using a less compute intensive model, almost purely for the benefit of the company. But as usual, they will rig the benchmarks and make it appear as though the model is better at certain things, like coding.

It’s an illusion, folks. You’re being played. Wake the hell up.

Also, I can’t believe that people still talk about SWE-Bench when there is a paper proving that the benchmark is completely useless because models regurgitate memorized answers.

Again, please, wake up.

https://arxiv.org/abs/2506.12286

2 comments

minimaxir 132 days ago

> Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance.

That's not how LLM quality works.

link

cadabrabra 132 days ago

Maybe not in theory but definitely in practice, as we’ve seen with GPT-5. These companies are lightning money on fire. If they reduce the cost, expect a proportional decrease in quality. All of the GPT-5 anecdotes confirm this. When the data and anecdotes disagree, the anecdotes are usually right, and the data is usually bullshit.

link

minimaxir 132 days ago

GPT-5's issues were due to router shenanigans which Claude models do not do.

link

cadabrabra 132 days ago

No dude, the latest versions of the models it routes to are markedly poorer in performance than their predecessors.

I’m observing a law that states: There appears to be a direct relationship between model performance and cost, such that whenever a company claims to have reduced inference costs, customers immediately notice a corresponding decline in model performance.

link

bigyabai 132 days ago

> It’s an illusion, folk. You’re being played.

How are they "being played" if Claude 5 isn't even out yet

link

cadabrabra 132 days ago

It’s already obvious that it will be a scam. Higher benchmark scores and lower cost are two signs that customers are about to get scammed. We saw it with GPT-5.

link

Redster 132 days ago

Respectfully,

Claude 3 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.1 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.5 Opus: $5.00 (Input) / $25.00 (Output) per 1M tokens

link

cadabrabra 132 days ago

This actually proves my point because if you read the anecdotes, you will notice a marked decline in performance. The version number goes up but the actual performance declines. The benchmarks can tell any story you want them to.

link

bigyabai 132 days ago

Is it? It might be possible that it's a scam, but for something to be "obvious" it has to release first.

There are plenty of ways to reduce inference cost for a high-intelligence model. Making sparser weights, for example, can increase the parameter count while reducing the inference cost and time.

link

cadabrabra 132 days ago

I get what you’re saying, but I still think that it will be a scam. Bookmark this thread and let’s continue the conversation after it’s released.

link

bigyabai 132 days ago

I think you are informed by more of an emotional interest than a technical one, here. You've written several such posts and many of them are astronomically unlikely predictions.

link

cadabrabra 132 days ago

Ok but didn’t Karpathy make it clear that we live in the vibe era? I’m inclined to trust vibes more than technical jargon, and boy are the vibes off with what’s been happening!

Let’s see what happens :)

link