Hacker News new | ask | show | jobs
by csomar 23 days ago
All of these are about AI misuse, not skepticism of AI. By skepticism I mean doubting whether AI actually delivers on its promises which, based on this last post, sounds like something you think we're already past.

Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products.

2 comments

It's hard for me to write about skepticism that coding agents deliver on their promises when I've been using them daily and know, for an absolute fact, that they boost my own productivity.

(And that's after taking into account the METR paper that says engineers over-estimate their productivity with these tools.)

I have plenty of doubts about AI delivering on its promises outside of coding. I don't write about AGI because I think it's science-fiction hysteria. I write about slop precisely because it represents a mis-use of AI that demonstrates people completely misunderstanding what it's useful for.

Love when people say "its promises". What specifically are you disappointed with? Simon's posts are high quality and evidence driven. AI has already delivered an incredible amount. Read Epoch for industry trends and analyses, METR to, everything points to a pretty consistent picture.

"Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products."

Oh yes, tons and tons, especially on HN. But the plural of anecdote is not data. Enterprise spend speaks for itself. You are using AI-coded functional products all the time. Do you want like a diff history for the Google codebase or something?

Tbf the OPs blog and comments (including their sibling to your comment) are also heavily anecdotal.

> I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.

Claiming a grand inflection point based on your own personal usage is very anecdotal.

If that were it I would absolutely agree with you. But this experience maps exactly to adoption trends. My job in the last 6 months has become so unrecognizeable to me it’s insane, the adoption at the very least at large companies is truly truly incredible, and it really does coincide with the quality of opus 4.5 (which has now been surpassed).
"Adoption trends" are just herd behavior which may or may not be driven by compelling anecdotes and may or may not be evidence of something more. I'm just saying it seems wrong to dismiss the post the way you did when the OP in question and your own post here are just more anecdotes.
No, if that were really true you wouldn’t see what you’re seeing today. You wouldn’t see entire companies completely retooled and refactored around these tools. You would see the mistake of “this is actually just herd behavior”, which involves such a colossal amount of impact to these companies entire stack and bottom line, resulting in systemic collapse. You don’t see that. Company leadership are not some idiot class of people, I don’t know why this is people’s prior. If companies get adoption wrong in either direction they are completely screwed. So you’re seeing people putting money where their mouth is, across the board.

Compelling anecdotes are not even the main source of evidence. Look at the enormous body of work on measurement of these systems. I always point people to epoch capability index as a good summary statistic of capabilities or METRs time horizon data which has now been topped out. They had a recent updated to the dataset, after which the corrected plots pointed to an even faster acceleration than before.

> You wouldn’t see entire companies completely retooled and refactored around these tools.

That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.

> resulting in systemic collapse.

Many people are noting the system is collapsing. Maybe it's not going as quickly as you expect, but there's definitely evidence of this from increased service outage frequency, billion dollar notes being passed in a circle between companies, open projects refusing AI contributions entirely because they're overwhelmed by crap, Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.

> Look at the enormous body of work on measurement of these systems.

It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.

I think my claim about November is looking very solid today.
My point was claiming a broad inflection point based on your own personal usage is not "evidence driven", it's anecdote-driven. It's hard to disprove any claim you made because you didn't really make one that's disprovable, and your opinion on it now is still just an opinion.
Yes, my opinions are driven by anecdotal evidence. I think that's fine: I have a pretty good track record, and I'm careful to share my reasoning.

If you want indisputable, data-driven information about the state of the LLM world I guess you can wait for a peer-reviewed academic paper?

Those have been around for a long long time, you may be focusing on anecdotes but the adoption numbers and performance trends speak for themselves and we’ve had performance trends for years. People can argue about whether or not enterprise level adoption has a clear ROI today but the fact that we’re at the point where entire large scale companies are already completely refactored, directly after opus 4.5, if that’s not a convincing enough signal I don’t know what is.
I think we're in agreement then; the point I was responded to was saying your blog was evidence-driven, and we can both agree it's not -- at least to the standard that would pass peer-review.