| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by camgunz 3 days ago
	Nothing corroborated this. Performance on benchmarks has practically leveled off. The big gains have come from architecture (have a secondary LLM review output) or searching the internet. Also prices are going up. Everything points to the likelihood that we're at the top of the curve.

3 comments

Gareth321 3 days ago

> Nothing corroborated this. Performance on benchmarks has practically leveled off.

[There is plenty of data to support the claim that AI continues to improve, even exponentially.](https://epoch.ai/trends)

As for benchmarks I feel compelled to remind you that as soon as a metric becomes a goal, it ceases to be a useful metric. The models optimise for solving the benchmark and we create new benchmarks to assess broader intelligence. As models converge on 100%, progress obviously slows. That doesn't mean intelligence isn't improving fast. It just means that that benchmark is being well served and we need other benchmarks to assess other forms of intelligence.

I would like to take your bet that we're near the top of the curve. I take the side of Geoffrey Hinton, the Nobel Prize laureate scientist known for his work on artificial neural networks. He believes AI is getting better even faster than he predicted. He estimates that every seven months AI becomes able to handle tasks twice as long.

link

camgunz 3 days ago

> [There is plenty of data to support the claim that AI continues to improve, even exponentially.](https://epoch.ai/trends)

This doesn't look at all exponential to me: https://epoch.ai/benchmarks?view=graph&tab=eci. OpenAI models went from 137 ECI to 159 ECI over about a year and a half, and the trends are similar for Anthropic and Google. These things have never been exponential.

> The models optimise for solving the benchmark and we create new benchmarks to assess broader intelligence. As models converge on 100%, progress obviously slows.

We are nowhere near 100% on important benchmarks like hallucinations: https://artificialanalysis.ai/evaluations/omniscience?model-...

...also, progress isn't improving with model releases.

---

We're running out of money. While we don't know how much it cost to train things like Claude, most (all?) industry reports indicate that a significant gain in function (2x) would require an exponential amount of resources (20x). No one's yet been able to convince investors that's worth it.

Also, we're running out of data: https://epoch.ai/publications/will-we-run-out-of-data-limits....

Also, we're running out of of low hanging fruit: "We find that the level of compute needed to achieve a given level of performance has halved roughly every 8 months, with a 95% confidence interval of 5 to 14 months. This represents extremely rapid progress, outpacing algorithmic progress in many other fields of computing and the 2-year doubling time of Moore’s Law that characterizes improvements in computing hardware (see Figure 2)." (https://epoch.ai/publications/algorithmic-progress-in-langua...). Maybe you think we'll continue along this breakneck pace, but again no investor thinks that, which is why prices are going up (investment is drying up).

Also we're running out of compute. Data center projects are stalling. Some of this is spiking energy prices, some of this is politics, much of this is grid constraints and supply chain problems: https://tech-insider.org/us-ai-data-center-delays-cancellati....

---

Finally, and perhaps worst of all, despite unprecedented investment data on the productivity gains is mixed. This is the biggest difference from other technological leaps like electricity, the industrial revolution, literally fire, etc. Those things were immediately, undeniably more productive. AI is not like that. You're not seeing an AI Microsoft, an AI Salesforce, an AI Oracle, an AI SAP, etc. You can argue that their advantages are structural, but there are no successful AI-powered alternative products (no AI Office, no AI ERP, no AI database, etc).

link

andy12_ 3 days ago

> Performance on benchmarks has practically leveled off

Ehm, no? DeepSWE[1] for example shows that new models like gpt-5.5 continue to show big improvements compared to older models.

> Also prices are going up.

Prices for frontier intelligence have gone up, but prices for the same level of intelligence have gone way down (what you can get for pennies now was SOTA just a couple of years ago). The pareto frontier is still expanding.

[1] https://deepswe.datacurve.ai/

link

snemvalts 3 days ago

Most benchmarks can be trained for as well, so they are over-representative of model's engineering skills. The entire nature of a benchmark is collapsing some qualitative work (software engineering task, architecture choice, code quality) into a quantitative score which can be optimized for.

link