Hacker News new | ask | show | jobs
by aerhardt 23 days ago
I find this analysis confusing. PMF for coding was likely reached some time last year. Profitability, which is different, we don’t know. The article kind of confuses both without making a strong economic case or using numbers in a compelling way. I don’t understand what the Uber case has to do with this either. The Uber COO clearly said that at least in terms of ROI he’s not seeing the results either.

My take is the product has been very useful for coding (PMF) for months. But it’s certainly not useful at any cost

7 comments

What I also find confusing though is that folks seem to ignore trajectory which is maybe the biggest lede to bury. As Simon says, we have had "good enough" coding agents for 6 months, that is a blink of an eye, and at my company my job has now completely changed. It's almost like a dream.

And that's just one inflection point. We've had several and there are many more on the horizon. So while I could be convinced that ROI is maybe not even positive today despite the ridiculous enterprise spend, it's perfectly rational to pave the way today for what's coming over the next few months let alone years down the line.

There may be additional major leaps forward, and there may not. I kind of struggle to imagine what the next step actually is. Certainly there will be improvements in performance (speed) and cost. But at a point you reach a barrier where the limiting factor is the specificity of the human prompt and our ability to manage all the code we’re generating.

Somewhat oversimplifying; writing software and building apps was a bottleneck - now it is not. What is the next bottleneck that LLMs can solve? Is there one? And is there enough publicly available data to solve it repeatably at scale? Or did we just automate stack overflow searches and now we’re stuck again?

Or is the endgame of this innovation cycle the complete removal of interaction with machines through code? Will we simply interact with machine coworkers purely through natural language? Can an LLM make PowerPoint slides and run a meeting? So far not seeing much progress on that.

Judging from the fact that the Opus 4.5 inflection point was not really anticipated, and we still don’t really know what threshold was crossed that suddenly made agentic coding accessible to so many more people, I think it’s safe to say we don’t know what the thresholds will be until they’re crossed. The fact that we don’t know exactly what they’ll be isn’t a good reason to think there won’t be any more.
> The fact that we don’t know exactly what they’ll be isn’t a good reason to think there won’t be any more.

Nor is it a good reason to think there will be more.

We should expect to see the process slowing down first. Until then we should expect it to continue with pretty high likelihood.

https://substackcdn.com/image/fetch/$s_!_ZW2!,f_auto,q_auto:...

I think we have quite good reason to expect more. As I said, we already know (caveat with your level of irrational skepticism toward the overwhelming evidence) that the best existing models are better than the ones publicly available.
For what it's worth, at PyCon US this year I ran into a few people with access to Claude Mythos and they confirmed that it's notably better at writing code than public Claude Opus 4.7.
> caveat with your level of irrational skepticism toward the overwhelming evidence

If you can talk about my irrational skepticism (because I said that "we don't know the future", I suppose?), can I talk about your total lack of common sense?

Because the economy has been growing in the last decades does not mean that it will keep growing for the next decades. Because LLMs have been improving in the last few years does not mean that they will keep improving in the next few years. Maybe, maybe not, your guess is as good as mine. If you know the future, put your money where you mouth is and invest everything you own in LLM companies.

Your overwhelming evidence is about the past: it has been improving in the past.

Based on how much money is chasing returns, and how steep the slope is, it's almost certain that we are still not at the end of this sigmoid cycle.

Sure, it might start to slow down, but even then we will likely see a doubling in the next 10-15 years.

https://substackcdn.com/image/fetch/$s_!_ZW2!,f_auto,q_auto:...

I am currently eating lunch. Meanwhile Claude is triaging and writing reproducers for 70+ tickets nobody has had time to look at. Next it will attempt to fix them. I have not read the tickets. I will not look at the code until there are review ready PRs and a code review bot have done the first pass.

In other words, most of the prompting will also go away.

Are you not concerned that you, too, will go away?
Feels like everyone should be on one hand. On the other hand it also feels like a massive recalibration of what companies can/should do. They spend massive amounts of money on AWS, Datadog, GitHub, CircleCi, et al. If it becomes easier to host/roll your own it's a big increase in the demand for engineers.

Ultimately software is everything these days and the economics make the demand insatiable. We've gone through many cycles of "X" but on computers/web/mobile. There's going to be a massive amount of "X" but with AI companies that will need engineers.

Or at least this is what I tell myself to sleep at night.

If I don't stay ahead of the curve, yes. But I can't stop that development. What I can do is leverage the technology enough to be more valuable than those who don't. By e.g. knowing how to set up processes like the above.

Ultimately, we'll need UBI or large scale cuts in working hours or similar if AI progresses to the point of mass unemployment - the alternative would be massive social unrest. In the meantime I expect to keep doing better than average.

yeah but if you have to pay $2k to $3k per month, would you still use it?
me, personally, today? no. My company? Yes.
Pmf is this weirdly defined thing where "if you're not sure you have it then you don't".

I think it was clearly useful for months to people who had tried it and taken the time to understand it, but now that knowledge has spread to the point where wallet holders are convinced it's not just passing fad or hype so now pmf can be "claimed".

I agree it's weird to say "those people have pmf" though, usually it's something you define for yourself

> Pmf is this weirdly defined thing where "if you're not sure you have it then you don't".

I'm not sure if this runs counter to your point or not, but: I don't see any future where LLMs aren't a core part of Software Engineering. The horse is out of the barn. There is no going back.

Yeah but the product is not “LLM” it’s “proprietary frontier model LLM paid by the token”.

And I don’t even necessarily disagree with OP! It’s more like the competition is shifting so quickly that your competitors could undercut your PMF in a blink of an eye.

There will be cheaper solutions. And they will generally be less capable than the more expensive ones. Just like most other products.

But my guess is that the cost of SWEs themselves mean that the more expensive ones will be worth the delta to most companies.

But time will tell.

History bears out that cheap and satisficing soundly beats expensive and optimal every time. Until we have smarter and more prescient decision makers in leadership, the bottleneck on output will be the quality of decision making not the quality of code. Trying more things faster and cheaper will win.
Aka the cheap plastic solution always wins.
True but that is maybe 5% of what is being promised by the average booster
Give examples of boosters (average or not) and what they've promised?
> clearly useful for people who took the time to understand it

people -> programmers, I haven’t met a non-developer who reports getting more time out of current AI platforms than they put in. If anything I’ve anecdotally heard the opposite, introducing AI at work creates so much slop (output) it takes more time to process it all without a tangible bump in overall productivity

I have at least a half dozen examples of people not hiring people or buying other tools/subscriptions because they built their own with Claude
PMF implies profitability. I could give away dollars for $0.80 and have unlimited demand but it doesn't mean I've found PMF.
Correct the cost is part of the economics.

Thats why most here shouldn’t engage in the discussion - they parrot on about benefits without identifying and articulating the costs and moreover how it affects the firms financial position.

The article also treats the word "good" as load-bearing in a way that should have you questioning their analysis:

"I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done."

Yet it’s backed up by adoption across the industry
MongoDB was once backed up by adoption across the industry. Or for a more recent example, blockchain took off like wildfire across the industry before ultimately fizzling out in all but the most niche applications.

Not saying this trend will do the same, just that the industry adopting something doesn't guarantee its success.

I don’t think those are really comparable. The blockchain was trendy hype, relatively few companies actually adopted it. Where did Netflix use the blockchain? Google?

By comparison almost all tech companies I know have leaned heavily into AI.

leaned heavily = purchases subscription to claude not changed processes around ai.
It’s not supposed to be logical, it’s an LLM evangelism blog that rarely, if ever, has any critical analysis that isn’t pro-industry. Read any/all of the other posts and you won’t find much skepticism but you will find a lot of shilling how great it all is.
I like his other posts. He's bullish on AI, which is fine. I'd like to read a mix of bearish and bullish level-headed takes from people who are subject matter experts. His technical credentials are well past discussion - I love Django, and he comes across as a pretty upbeat but level-headed guy. Certainly beats radical takes in either direction from people who have no clue what they're talking about. It's just this article that I find rather confusing.
The thing that matters most to me is if reading what I wrote teaches you some new things and gives you something useful to think about.

If I make an argument and you disagree that's fine with me, provided I didn't use misinformation or sloppy thinking in making that argument.

That's how I feel about most of your writing. I click through most times when I see you either on the front page or in the comments, and I generally walk away feeling like I have food for thought, without necessarily buying everything wholesale. It's part of why I keep coming back.

My root comment simply represented my two cents about the current post. I don't think anything about the post is outrageously incorrect or anything, just somewhat confusing. You're a very prolific contributor in this community and I don't think me or anyone else that welcomes your takes expects everything you write to rock our collective socks every single time, anyway.

308 posts on AI ethics: https://simonwillison.net/tags/ai-ethics/

52 on AI misuse: https://simonwillison.net/tags/ai-misuse/

149 on the unsolved challenge of prompt injection: https://simonwillison.net/tags/prompt-injection/

40 on slop: https://simonwillison.net/tags/slop/

If you want an "LLM evangelism blog that rarely, if ever, has any critical analysis that isn’t pro-industry" there are plenty out there. I'm not one of them.

People are confusing "excitement" with "evangelism". Your blog is definitely on the pro-AI side of things, but as you say, it's not one-sided or uncritical.
I think you should highlight your exemplary pre-AI writing too.
All of these are about AI misuse, not skepticism of AI. By skepticism I mean doubting whether AI actually delivers on its promises which, based on this last post, sounds like something you think we're already past.

Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products.

It's hard for me to write about skepticism that coding agents deliver on their promises when I've been using them daily and know, for an absolute fact, that they boost my own productivity.

(And that's after taking into account the METR paper that says engineers over-estimate their productivity with these tools.)

I have plenty of doubts about AI delivering on its promises outside of coding. I don't write about AGI because I think it's science-fiction hysteria. I write about slop precisely because it represents a mis-use of AI that demonstrates people completely misunderstanding what it's useful for.

Love when people say "its promises". What specifically are you disappointed with? Simon's posts are high quality and evidence driven. AI has already delivered an incredible amount. Read Epoch for industry trends and analyses, METR to, everything points to a pretty consistent picture.

"Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products."

Oh yes, tons and tons, especially on HN. But the plural of anecdote is not data. Enterprise spend speaks for itself. You are using AI-coded functional products all the time. Do you want like a diff history for the Google codebase or something?

Tbf the OPs blog and comments (including their sibling to your comment) are also heavily anecdotal.

> I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.

Claiming a grand inflection point based on your own personal usage is very anecdotal.

If that were it I would absolutely agree with you. But this experience maps exactly to adoption trends. My job in the last 6 months has become so unrecognizeable to me it’s insane, the adoption at the very least at large companies is truly truly incredible, and it really does coincide with the quality of opus 4.5 (which has now been surpassed).
I think my claim about November is looking very solid today.
And what happens when open models catch up in 6 months or so?