Hacker News new | ask | show | jobs
by aspenmartin 9 days ago
It's nice that people are genuinely curious about this.

- All of your observations are absolutely dead on

- Yet, we have very very very robust scaling laws that as Dario points out we've had and validated for over a decade. This extends to downstream measures like METR time horizon and compsosite benchmarks like the epoch capability index.

- If you look at where you're at now, which is again dead on, you're looking at a point on a curve that is quite easy to extrapolate, but less easy to tell when exactly on the curve a certain capability or use case undergoes a step change from error rates dropping below a threshold that is hard to anticipate in advance.

So while Dario / other frontier CEOs are understandably unpalatable, they are absolutely spot on with a call out that all of this is bound to happen and happen quickly, and that's without solving several core problems that haven't been solved yet (e.g. continual learning). In 2023, coding agents were just laughable. Yet they followed the same predictable training curves. Anyone looking at the data can see the obvious, and anyone reading newspaper headlines or hacker news comments would get a very different impression.

2 comments

Are we plotting against cost? How is the capability advancement vs dollars paid for development?

By my read of the (very sparse) data, we're getting linear improvements in capability for super-linear increases in costs. [1] Indicates that by 2027 models will cost $1 billon to train. Dario estimates that model runs will cost $10 billion in 2026 [2]. That to me indicates costs are potentially growing faster than capability. Maybe by quite a bit.

If the value prop of LLMs doesn't prove out, that won't last. I'm of the opinion there is no data that shows actual economic value being delivered by models. The best data shows that LLM use might be destroying value [3].

[1] https://epoch.ai/publications/how-much-does-it-cost-to-train... [2] https://lexfridman.com/dario-amodei-transcript/ [3] https://unessays.substack.com/p/talk-is-cheap

I appreciate the data here but I don't think the read is quite right;

Saying we have linear capability for super-linear cost compares an unbounded variable (dollars) to bounded instruments (because benchmarks saturate). On unbounded measures, growth is exponential; you can see METR time horizons double every ~4-7 months (https://metr.org/blog/2026-1-29-time-horizon-1-1/). And capability being proportional to log(compute) is what the scaling law predicts.

Epoch puts training cost growth at ~2.4x/year as your link shows. Meanwhile cost for fixed capability falls ~10-40x/year (https://epoch.ai/data-insights/llm-inference-price-trends), and lab revenue is growing ~10x/year! Anthropic went from $1B to $9B to $30B+ run rate in ~15 months, OpenAI ~$25B.

On [3]: the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev. The RCT evidence is genuinely mixed (METR: -19%, with n = 20 and Claude 3.x; Cui et al: +26%) but its just super hard to do this well, I think Faros stuff was pretty cool, I haven't seen this before so thank you for the reference.

>"On unbounded measures, growth is exponential"

Maybe. There was a great comment in the thread on Fable 5 yesterday about benchmark comparisons between Fable and the latest opus models. here it is: https://news.ycombinator.com/item?id=48464600.

You could be right, but this is the most direct benchmark comparison I could find and it's not that strong.

>the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev.

I discuss this directly in my analysis. There's also an 860% code churn increase ratio. You only need 9% of that to be allocated to wasteful rework to drive throughput flat to the 15% rework baseline. Not to an assumed ideal state where there was no rework.

But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.

I appreciate you reading my stuff and taking the data seriously. Thank you.

  > But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.
n=1 but at $JOB we have throughput quotas now, and what is happening is that teams are just finding lots of busywork (renaming things, gardening of ai .md files, rewriting uis etc) and also dividing prs into smaller chunks to match the quotas... so even "throughout increase" doesn't say much if its not for improving the customer outcome (ime anyways)
Productivity != value.

Thanks for the story.

METR's time horizon is not a reliable metric of LLM capability growth: https://www.transformernews.ai/p/against-the-metr-graph-codi...
Yes I've seen this before, and while the critiques are fair and high quality (and unfortunately not unique to METR) we're missing the forest for the trees here.

First of all, if you take the articles critiques and work out the implications on the METR graph, all you're doing is shifting the curve up or down, it doesn't change the fact that progress is scaling exponentially. While it is technically possible the universe could be throwing a massive pathological curveball to change the conclusion from METR data (which is we've been seeing exponential growth over the last 6 years), I think that seems very far from likely. The fact that we see the same behavior from a variety of sources over a wide variety of tasks and domains is a pretty clear indication that METR while certainly far from perfect is actually painting a consistent picture at least in terms of the rate of progress.

You can look at ECI for a summary benchmark statistic, which does NOT use METR's benchmark, and you see a similar trend. Same with SWE-bench where the task distribution is far more in domain for real world problems. It is a bummer that this METR data can't be better funded. It would probably take $1M or so to really beef it up properly which any of these labs probably have in their couch cushions.

Wow. This deserves to be much more widely read. Thank you for this.
>By my read of the (very sparse) data, we're getting linear improvements in capability for super-linear increases in costs. [1] Indicates that by 2027 models will cost $1 billon to train. Dario estimates that model runs will cost $10 billion in 2026 [2]. That to me indicates costs are potentially growing faster than capability. Maybe by quite a bit.

This is true and well established.

As long as you get any improvement whatsoever, it is worth spending to train since it pays off during.

Imagine training was not $1 billion but $100 billion but the performance improved by just 10%. This is still worth it because you can squeeze out the profits across years and years right? The improvement is ever lasting.

> The best data shows that LLM use might be destroying value [3].

This is basically a conspiracy theory and if you really believed this, you should not have led with "How is the capability advancement vs dollars paid for development?" because if there were no value, it doesn't really matter how much you invest.

>This is basically a conspiracy theory

I think this is pretty uncharitable, especially when I've provided you with a dataset you can evaluate yourself and an argument you can review for logical inconsistency.

I have worked quite hard to locate data that supports your thesis, I can't find it. I've at least gone to the effort of documenting that search. Before you throw around such strong convictions, I suggest you actually look for yourself.

Respectfully, your link is not very convincing.

But what’s interesting is that you are commenting on a post where Dario is suggesting that LLMs are so extremely powerful that they can take over, help synthesise bioweapons, help in warfare, help in drug discovery — the whole post here is to try and regulate this. If you believe AI can’t even create positive value let alone discover new things then your problem is somewhere else and not in something like “but training costs a lot”.

So it is absolutely strange and contrasting to see you believe that LLMs are so weak as to create negative value while the CEO is asking about regulations because AI is too powerful.

I don’t think I can convince you that AI is actually that powerful.

But let me ask you something directly: if you believe what you believe, you should also acknowledge that AI doesn’t need regulations in the context Dario is proposing since obviously AI can’t do anything he predicts. Do you agree?

> So it is absolutely strange and contrasting to see you believe that LLMs are so weak as to create negative value while the CEO is asking about regulations because AI is too powerful.

You wouldn't ask a chemistry professor to write code. So just because LLMs create negative value for software development doesn't mean that they can't be helpful for bioweapons synthesis, especially considering the range of chemistry and biology sources Anthropic would have fed to its LLM that wouldn't be publicly accessible. The LLM doesn't even need to be particularly accurate so long as the amateur bioweapons researcher takes adequate precautions before following its instructions and does some background research beforehand.

This is a ridiculous stance to take. That LLMs are simultaneously negative value but can also help synthesise bioweapons. It’s the sort of stance you take when you already feel ideologically against AI. I don’t think it’s coherent.
>Respectfully, your link is not very convincing.

I'd love to understand why. This would be valuable feedback for me as I try to make my writing and exposition better. Also, if you have other data, that also would be valuable for me to know.

>if you believe what you believe, you should also acknowledge that AI doesn’t need regulations in the context Dario is proposing since obviously AI can’t do anything he predicts. Do you agree?

I think you misunderstand my beliefs. On net I think how we're using LLMs destroys value. That doesn't mean no one ever gets value from LLM use.

My particular point about trillion dollars is - the main place Anthropic, OpenAI, and - hilariously - SpaceX think they will drive value creation is in enterprise applications. In that domain I think the evidence is very convincingly negative. I'm certainly not the only person who thinks this. It's pretty well accepted in economics right now that there is no observed organizational level productivity improvement. Lines break down on whether it will show up eventually or whether we will wait forever.

My belief about LLM value is that it's most useful for individuals and small teams. Places where coordination and trust are easily established and feedback loops to value creation are tight. They are "short range" as it were.

Their value starts to erode as soon as a user becomes disconnected from the point of direct value creation. Which is pretty much everyone who works inside of a large organization. It becomes negative at pretty small scale, IMO. I do think there are patterns of use that could drive value at these scales. I talk about that in my post.

On Bioweapons in particular, I could see small teams of people working to build something very dangerous. Having spent my formative academic years in a biochemistry and microbiology lab though, I do think the danger is overstated. Papers are not know-how or equipment. There's a lot of tacit knowledge that can't get written down that is super hard to acquire.

But, I'd be happy for us to regulate AI for dangerous applications.

My question would be - why would Anthropic build something they so clearly think is dangerous? If they were really building something deserving of the valuation they have, why build applications like this?

To my eyes - it's super weird that a company would build something they think is dangerous and turn around and beg the governments of the world to stop them. That's really strange behavior from my perspective.

I went through your post in substack (I think that's what you were referring to).

> I'd love to understand why. This would be valuable feedback for me as I try to make my writing and exposition better. Also, if you have other data, that also would be valuable for me to know.

I think it comes down to few things

- you took a single report that agreed with your statistics, for the sake or argument lets say I buy it completely

- you suggest that net value is lost simply because there are more incidents. this is a big jump

- you say that historically different technological improvements may have had similar patterns but this specific one is different because AI is stochastic

So it all really rests on you finding one distinction with AI and then disagreeing with the past trends.

I agree AI is stochastic and I'll put it this way: it is a high variance bet but it pays off. This is a bit hard for people to understand -- its a tool that works sometimes really nicely and fails other times. Overall you are better off using it but you need to use it enough to reduce variance.

Let me ask this: if you are so sure this won't lead to enterprise level productivity, how do you think this will show in macro trends? Surely you must believe that the valuations must drop wouldn't you? Can you come up with a concrete future scenario that would vindicate your opinion that AI doesn't make enterprises more productive?

> My question would be - why would Anthropic build something they so clearly think is dangerous? If they were really building something deserving of the valuation they have, why build applications like this?

I think this is fair and interesting question. Here is what I think they think: If they don't build it, someone else might do it. And they think they are more moral than others. If they have a head start they can set the political and regulatory landscape.

You have engaged in good faith discourse, thanks! I'll reply in a bit.
That’s interesting. I commented something about this elsewhere but to me part of the exponential argument that loses me though is that it can often seem like a way to distract from issues that already exist which we should be working to fix. Things like autonomous weapons or mass surveillance are already here and rather terrifying and I would hope that we would dedicate our time to fixing those rather than having industry leaders focus so much on hypotheticals. While I guess the hypothetical scenario could be so bad that we must focus on it, I imagine a world which can’t come up with a way to spread wealth more equally or prevent mass proliferation of surveillance technology through profit seeking behavior will not be able to handle a digital super intelligence. So I keep coming back to the question: why is all I hear these industry leaders talking about is the threat of extinction? Maybe it’s just news coverage but I would love to see a leading lab release research on the health effects of subaudible sound in datacenters or other immediately present issues which would build good will towards these further out concerns.
>why is all I hear these industry leaders talking about is the threat of extinction? . . . I would love to see a leading lab release research on the health effects of subaudible sound in datacenters

It is straightforward for industry leaders to avoid living near data centers, but there's no way for them to insulate themselves from the extinction threat -- no way short of somehow eliminating the danger for everybody, which seems quite hard to do. Since industry leaders are as self-centered as everyone else, the extinction threat is what they think about.

Also, you describe the extinction threat as "further out". A lot of us think there is already some small amount of AI extinction risk being incurred every day. I.e., we think the period of danger has already begun.

I see. I wonder how this works out in terms of risk/reward. I suppose if you take extinction as -infinite cost than it would be the only issue worth thinking about. Where I think this line of thinking gets challenging is when you need to take in terms of a counter factual. A lot of these were already risks prior to AI (bioweapons, nukes, etc) so what’s the marginal increase in probability as a result of AI I guess is the question which matters. I could get more around this way of framing it than saying that AI itself is the problem. It’s just the being more capable as a species increases risks. I think a lot of these pushback comes from the fact that it’s often the CEO who stands to gain huge by saying his tool is going to end the world so we need public buyin to supporting it. If instead it was just framed as “general technological advancement” is dangerous but potentially worthwhile I think more would be on board.
I'm doubtful of this idea that the reason the CEO of Anthropic says that AI could end the world is because he "stands to gain huge" by saying it.

If he is willing to lie to give his corporation some advantage, he couldn't come up with a lie that would sound less absurd and outlandish to the average decision-maker, who doesn't have much time to learn about this particular technology?

It is more likely in my eyes that he says it because he genuinely worries about AI extinction risk -- like many people do who've studied the technology for a long time.