Hacker News new | ask | show | jobs
by ModernMech 18 days ago
Tbf the OPs blog and comments (including their sibling to your comment) are also heavily anecdotal.

> I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.

Claiming a grand inflection point based on your own personal usage is very anecdotal.

2 comments

If that were it I would absolutely agree with you. But this experience maps exactly to adoption trends. My job in the last 6 months has become so unrecognizeable to me it’s insane, the adoption at the very least at large companies is truly truly incredible, and it really does coincide with the quality of opus 4.5 (which has now been surpassed).
"Adoption trends" are just herd behavior which may or may not be driven by compelling anecdotes and may or may not be evidence of something more. I'm just saying it seems wrong to dismiss the post the way you did when the OP in question and your own post here are just more anecdotes.
No, if that were really true you wouldn’t see what you’re seeing today. You wouldn’t see entire companies completely retooled and refactored around these tools. You would see the mistake of “this is actually just herd behavior”, which involves such a colossal amount of impact to these companies entire stack and bottom line, resulting in systemic collapse. You don’t see that. Company leadership are not some idiot class of people, I don’t know why this is people’s prior. If companies get adoption wrong in either direction they are completely screwed. So you’re seeing people putting money where their mouth is, across the board.

Compelling anecdotes are not even the main source of evidence. Look at the enormous body of work on measurement of these systems. I always point people to epoch capability index as a good summary statistic of capabilities or METRs time horizon data which has now been topped out. They had a recent updated to the dataset, after which the corrected plots pointed to an even faster acceleration than before.

> You wouldn’t see entire companies completely retooled and refactored around these tools.

That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.

> resulting in systemic collapse.

Many people are noting the system is collapsing. Maybe it's not going as quickly as you expect, but there's definitely evidence of this from increased service outage frequency, billion dollar notes being passed in a circle between companies, open projects refusing AI contributions entirely because they're overwhelmed by crap, Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.

> Look at the enormous body of work on measurement of these systems.

It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.

> That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.

Not at this scale.

> Many people are noting the system is collapsing.

On HN? any piece of evidence to support this? service outage frequency is not a sign of systemic collapse. billion dollar notes passed in a circle is brought up a lot and misunderstands how finance works. "open projects refusing AI contributions entirely because they're overwhelmed by crap" is not a systemic collapse, its not being able to adapt to a new world with new challenges. Btw "slop" is getting less and less sloppy.

> Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.

Very interested in some citation detail that sounds like a headline quote of something more complex.

> It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.

I mean I work on benchmarks for a living I can tell you both of these things are true but only partially, and in aggregate they all tell a consistent story. Not to mention, static OSS benchmarks are not what these companies rely on. They have live traffic, ability to run A/B tests, full conversation traces, to ignore this is pretty incredible.

I think my claim about November is looking very solid today.
My point was claiming a broad inflection point based on your own personal usage is not "evidence driven", it's anecdote-driven. It's hard to disprove any claim you made because you didn't really make one that's disprovable, and your opinion on it now is still just an opinion.
Yes, my opinions are driven by anecdotal evidence. I think that's fine: I have a pretty good track record, and I'm careful to share my reasoning.

If you want indisputable, data-driven information about the state of the LLM world I guess you can wait for a peer-reviewed academic paper?

Those have been around for a long long time, you may be focusing on anecdotes but the adoption numbers and performance trends speak for themselves and we’ve had performance trends for years. People can argue about whether or not enterprise level adoption has a clear ROI today but the fact that we’re at the point where entire large scale companies are already completely refactored, directly after opus 4.5, if that’s not a convincing enough signal I don’t know what is.
I think we're in agreement then; the point I was responded to was saying your blog was evidence-driven, and we can both agree it's not -- at least to the standard that would pass peer-review.
I don't know if there's much in my blog that could be backed by robust evidence. It's not like "product-market fit" is a testable condition.

I did try to provide credible links to back up my assertions about things like enterprise pricing changes.