Hacker News new | ask | show | jobs
by peteforde 141 days ago
I suspect that this will drive the folks who insist LLM productivity gains are the real hallucinations truly bonkers.
7 comments

No, the fact that Show HN is spammed with LLM-generated garbage is what drives me bonkers. The Show HNs are in fact living proof of how illusory LLM productivity gains are, because we are overwhelmed with trivial proof-of-concepts that have no merit, not even the merit of a human having put effort into creating something neat, rather than actually interesting software anybody would try or discuss.
Related that r/selfhosted has banned AI built projects except on Fridays[1] to keep up with the increased deluge of garbage, which are mostly built for CV padding rather than making anything useful for the community.

https://old.reddit.com/r/selfhosted/comments/1qfp2t0/mod_ann...

Counterpoint: You won't admit anything generated with LLMs is good? I don't see any evidence of your fairness in your comment, so why should I consider you any differently than the angry dude at the bar complaining over his drinks about how things were in his day?
> You won't admit anything generated with LLMs is good?

Nowhere in my comment did I say this, so this is quite a non-sequitur you've based the following personal attack upon. Regardless of whether it's possible to use LLMs to generate good things, the vast majority of things generated with them are not good, and if the good things exist, they are being drowned out in a sea of spam, increasingly difficult to discover along with the good human-generated content.

I have to say, I would characterise both your comment and the original comment I replied to as being considerably more "unfair" than mine. The first comment was clearly written in such a way to get a rise out of people. Your reply is directly insinuating that I'm out-of-touch and ranting at clouds.

This is a valid observation. I wonder though if people who have been coding for decades, but choose to use AI assistance, would fall under the same AI slop category. It’s an interesting dilemma because the overwhelming amount of content getting posted just ends up breeding a ton of negative feelings towards any amount of AI usage.
It will if you let it. The number of times the AI has come up with 'I can write you 'x', 'y' or 'z' in a heartbeat, just say the word' and I keep on having to steer it back to the track of being a repository of knowledge rather than an overeager very junior co-worker that can't help themselves to want to show off their skills.

It's very tiresome. Like an idiot/savant, they're an idiot most of the time and every 10th try you go 'oh, but that's neat and clever'.

I feel like HN is quite divided about that actually, A couple of days I started a survey which I plan to run monthly to see how the community feels about "LLM productivity etc". Now I have ~250 answers, need a couple more to make it significant but as of now it looks like >90% report productivity gains from AI tools - happy if you participate, only takes a minute: https://agentic-coding-survey.pages.dev/
Note that self-reporting productivity gains is a completely unreliable and unscientific metric. One study[1], small in scope but a noteworthy data point, found that over the course of the study that LLMs reduced productivity by ~20% but even after the fact the participants felt that on average their productivity had increased by ~20%. This study is surely not the end-all be-all and you could find ways to criticise it or say it doesn't apply or they were doing it wrong or whatever reason you think the developers should have had increased productivity, but the point is that people cannot accurately judge their own productivity by vibes alone.

[1] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

If you look at the survey it's not only about productivity it's also about usage, model choice etc. But I agree with you self reported productivity gains is to be taken with a grain of salt. But then what else would you propose? The goal is to not only rely on benchmarks for model performance but develop some kind of TIOBE Index for LLMs.
The ever-present rebuttal to all LLM failure anecdotes: you're using the wrong model, you're prompting it wrong, etc. All failures are always the user's fault. It couldn't possibly be that the tool is bad.
Of course, your logic could also be equally allied to the opposite position.

Quite a few of us are tired of being told that we're imagining doing what used to take weeks multiple times in an evening.

If it generated something that saved you weeks, I think it's almost certainly because it was used for something you have absolutely zero domain understanding for and would have had to study from scratch. And I, at least, repeatedly do note that LLMs lower the barrier to entry for making proof-of-concepts. But the problem is that (1) people treat that instant gratification as a form of productivity that can replace software engineers. At most, it can make something extremely rough that is suited to one individual's very specific use case, where you mostly work around the plentiful bugs by knowing the landmines are there and not doing the behaviour that trips them; and (2) people spam these low-effort proof-of-concepts, which have no value to other people on account of how rough and lacking in ability to be extended to cover more than one person's use case they are, and this drowns out the content people actually put effort into.

LLMs, when used like this, do not increase productivity on making software worth sharing with other people. While they can knock out the proof-of-concept, they cannot build it into something valuable to anyone but the prompter, and by shortcircuiting the learning process, you do not learn the skills necessary to build upon the domain yourself, meaning you still have to spend weeks learning those skills if you actually want to build something meaningful. At least this is true for everything I have observed out of the vibe-coding bubble thus far, and my own extensive experiences trying to discover the 10x boost I am told exists. I am open to being shown something genuinely great that an LLM generated in an evening if you wish to share evidence to the contrary.

There is also the question of the provenance of the code, of course. Could you have saved those weeks by simply using a library? Is the LLM saving you weeks by writing the library ""from scratch"", in actuality regurgitating code from an existing library one prompt at a time? If the LLM's productivity gain is that it normalized copying and pasting open-source code wholesale while calling it your own, I don't think that's the great advancement for humanity it is portrayed as.

Really like the ux of that survey - super easy to fill out, is it just a custom web form or you used a library?
Yes exactly, it's a standalone cloudflare page with some custom html/css that writes to a D1 (Cloudflare SQL DB) for results and rate limits, thats's it. I looked at so many survey tools but none offered what I was looking for (simple single page form, no email, no signup, no tracking) so I built this (with claude) Thanks for the feedback!
They are, but in the sense of net productivity gains.

Responsible people who use their knowledge to review LLM-generated code will produce more - up to their maximum rate of taking responsibility.

Irresponsible people will just smear shit all over the codebase.

The jury is out what's the net effect and the agents' level of sophistication is a secondary factor.

IMO a productivity gain of about x2 seems about right!
/r/selfhosted also got tons of new submissions, all unmaintainable AI slop. Now that they are only allowed on Fridays, it calmed down again. But I guess folks who insist on AI superiority think that’s a productivity gain.
The people spamming built bad stuff because they don't know any better. They would have built zero software without AI, so to the extent that anyone built anything working at all, it's basically an infinite productivity increase for those people.
AI productivity gains are not found in the slop bucket with projects tossed off after five prompts and zero intention of keeping them alive for the longer run.
It's slop disguised as productivity.