Hacker News new | ask | show | jobs
by yesensm 84 days ago
I’m curious whether anyone has measured this systematically. Right now most of the evidence for multi-agent setups still feels anecdotal.
2 comments

And expensive, exactly the way a pay per use product would push its customers…

“It’s not working well enough!” We tell them. They respond with “Have you tried using it more?”

Back in 2024 I read a study saying: "Ask 4 LLMs the same question, if they all give you the same answer there is some 95-99% chance its correct"

Soooo... Its not just greed. There is something there.

Yes exactly. I’m talking about this in the article. I found out that when Claude and Codex both review the same PR and both find the same issue, our team fixes it 100% of the time.
What's the point of pair programming then if they both have the same opinions?
They don't. And you would be surprised how a good model actually pushes back on some comments.

The point was: when they do agree, it is a very strong signal.

There are a number of different models out there.
Haha yeah... Wait until they start jacking up the subscription prices
They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.

Sometimes they'll announce the changes, and they'll even try to spin it as improving services or increasing value.

Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.

> They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.

I can't imagine that this is the way it will go... Tokens haven't been getting cheaper for flagship models, have they? You already see something closer to their real cost if you compare e.g. the Claude subscriptions to their actual token pricing.

> Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.

Maybe, but LLMs are scale game, and data center will always be more capable than your local device. So, you will always be getting a worse version locally. Or do you think we'll LLMs in data centers stop getting better and local LLMs will somehow catch up?

Completely with you on this! But then we need to define the cirteria for comparison. Might not be that easy unfortunately