| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yesensm 84 days ago
	I’m curious whether anyone has measured this systematically. Right now most of the evidence for multi-agent setups still feels anecdotal.

2 comments

not_ai 84 days ago

And expensive, exactly the way a pay per use product would push its customers…

“It’s not working well enough!” We tell them. They respond with “Have you tried using it more?”

link

3yr-i-frew-up 84 days ago

Back in 2024 I read a study saying: "Ask 4 LLMs the same question, if they all give you the same answer there is some 95-99% chance its correct"

Soooo... Its not just greed. There is something there.

link

axldelafosse 84 days ago

Yes exactly. I’m talking about this in the article. I found out that when Claude and Codex both review the same PR and both find the same issue, our team fixes it 100% of the time.

link

zombot 84 days ago

What's the point of pair programming then if they both have the same opinions?

link

axldelafosse 84 days ago

They don't. And you would be surprised how a good model actually pushes back on some comments.

The point was: when they do agree, it is a very strong signal.

link

pixl97 84 days ago

There are a number of different models out there.

link

shafyy 84 days ago

Haha yeah... Wait until they start jacking up the subscription prices

link

observationist 84 days ago

They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.

Sometimes they'll announce the changes, and they'll even try to spin it as improving services or increasing value.

Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.

link

shafyy 83 days ago

> They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.

I can't imagine that this is the way it will go... Tokens haven't been getting cheaper for flagship models, have they? You already see something closer to their real cost if you compare e.g. the Claude subscriptions to their actual token pricing.

> Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.

Maybe, but LLMs are scale game, and data center will always be more capable than your local device. So, you will always be getting a worse version locally. Or do you think we'll LLMs in data centers stop getting better and local LLMs will somehow catch up?

link

stackgrid 84 days ago

Completely with you on this! But then we need to define the cirteria for comparison. Might not be that easy unfortunately

link