| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by civvv 82 days ago
	There are many indications that model progress is slowing down, so that is not entirely accurate.

3 comments

aspenmartin 82 days ago

Please be specific because outside of anecdotal blog posts by people who don’t know what they’re talking about it’s not true. Look at scaling laws, composite benchmarks from the epoch capability index, nothing at all suggests “model progress is slowing down”

StrauXX 82 days ago

Which indications are that?

nicoburns 82 days ago

The cost factors on the new models compared to the old models.

jeremyjh 82 days ago

Qwen3.6 9B is as good as GPT-4o and runs on my M2 MacBook Air. Models are getting stronger and less costly at the same time, but these are somewhat separate branches of research. Frontier labs are spending more because they are still getting marginal returns and there is more capacity to spend than there was a year ago.

gertop 82 days ago

Qwen 3.6 9B doesn't exist.

If you meant 3.5 9B and you truly believe it's as good as 4o then I can only assume you have a very basic use case.

jeremyjh 82 days ago

You are right, I was mistaken about the version. I evaluated it in general chat assistant prompts plucked from my history across a range of topics but did not use it for coding - there was never a time when I thought 4o was “good enough” for agentic coding.

bdelmas 82 days ago

You are mixing cost and progress. It’s not because it’s more and more expensive that progress is slowing down by itself.

nicoburns 82 days ago

They are intrinsically linked beyond a certain point. If we're making progress but costs are spiraling exponentially then it stands to reason that we will soon reach a point where we can no longer afford the increasing costs and thus progress will slow.

(barring some breakthrough that reduces costs, which of course may happen, but for which recent model improvements are not strong evidence of)

aspenmartin 82 days ago

Cost for a specific level of performance decreases 10x per year, this has been a pretty consistent property for awhile now.

butlike 79 days ago

I guess within the domain of AI, a pertinent question would be: "do I want to use anything but the best?" The errors older models give being directly analogous to being stupider in my eyes.

aspenmartin 78 days ago

Depends — many tasks in various pipelines have a reasonable Pareto frontier and diminishing returns after a certain level of performance. You may just have a high budget constraint (say like YouTube computing ASR subtitles; they are not going to be using the best ASR models because it’s expensive). If it’s myself, with a coding agent, I’m going to get the best thing I can afford.

overfeed 82 days ago

Investment dollars.

dzhiurgis 82 days ago

Source for that claim?

lionkor 82 days ago

Nobody is releasing NEW models

aspenmartin 82 days ago

…not only is this not true but it also doesn’t matter. Why would this indicate performance saturating?

taneq 82 days ago

The standard networking connection has been called “Ethernet” for more than thirty years, so networking has stagnated, right?

SlinkyOnStairs 82 days ago

If higher bandwidth networking consisted primarily running more and more ethernet lines in parallel, you would most certainly agree that "networking has stagnated".

"Reasoning" and now "Agentic" AI systems are not some fundamental improvement on LLMs, they're just running roughly the same prior-gen LLMS, multiple times.

Hence the conclusion that LLM improvement has slowed down, if not stagnated entirely, and that we should not expect the improvements of switching to these "reasoning" systems to keep happening.

p1esk 82 days ago

From TFA:

“ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove”

SlinkyOnStairs 82 days ago

You misunderstand. I'm not saying that Reasoning/Agentic systems aren't better.

I'm saying they're not an advancement in the tech in the way GPT 1 through 3 were. They're a different kind of improvement.

And as such the rate improvement cannot just be extrapolated into the future.

kstenerud 82 days ago

What constitutes a NEW model for the purposes of calculating progress?

GardenLetter27 82 days ago

What? DeepSeekV3 just came out and is incredible for the price. Mythos is also half-released.

nozzlegear 82 days ago

Until you or I can actually use Mythos in Claude without an nda or other strings attached, Mythos is not released and is just an effective marketing tool for Anthropic.

pixl97 81 days ago

At least to me this is a pretty sour grapes take. There are all kinds of released products that are expensive or need an NDA. You're just too poor to afford it. But make no mistakes there are governments using this in mass and likely against you.

nxobject 81 days ago

I think that’s worthy of at least sour grapes, too.

CuriouslyC 82 days ago

Model progress at spitting out unhallucinated facts is slowing down hard. Model progress at solving hard math challenges/programming tasks doesn't seem to be slowing down that I can tell.