Hacker News new | ask | show | jobs
by locknitpicker 1 day ago
> Does AI make incredibly inefficient code most of the time? Yup. But it does it at lightspeed with minimal effort.

This hits the nail in the head.

Detractors often hang on to examples of coding assistants making mistakes or output subpar code, but they somehow miss the fact that coding assistants can also be prompted again and refactor whole swaths of code just as fast as they introduce oopsies. This means that the worst case scenario implies fast convergence to an acceptable outcome, and from there also fast iteration to improve upon that.

5 comments

The problem is that this approach is not sustainable. Errors compound. The cost to fix one issue might seem small at first, but over a stretch of time all these "oopsies" become architectural spaghetti that can only be fixed with a complete rewrite, which will certainly become more expensive than getting the code "organically" developed.

The only way I see AI coding working in the long run is if we go back to a Waterfall/BDUF process and having actual engineering. Let engineers really own the architecture. Enforce that any new feature - no matter how small - to be specced out with complete sequence diagrams. Ensure that every new software package needs to be put on an UML component diagram for the team to review and see each addition interacts with the whole system, etc.

If we do that, then we can just give all the documents to a coding agent and say "go ahead and implement this" with a minimal amount of confidence. But in doing this, I bet we will realize the following:

    - the "effort" has never been about writing code itself. The code is just the material manifest of all the thought that went to think over a solution into the problems that the product is attempting to solve.

   - we will likely be better off by using code generation tools (i.e, UML-to-code) and a "weak" LLM (than can run locally) than by playing the token lottery at the Anthropic Casino.
I mirror your thoughts. I think we'll end up with "perfect map" paradox = you cannot be vague or indecisive on what you want (and if you are then these decisions don't matter) and you're creating a 1:1 representation of what the code needs to be.

I'd substitute "owner" for the team and in that sense the owner will not need to be human.

We're at this state where Claude is great at doing the "middle" part of work, but it's crap at gathering requirements and verification of what it has done. I also don't see people caring about these aspects of software development as shown in the article

> The problem is that this approach is not sustainable. Errors compound. The cost to fix one issue might seem small at first, but over a stretch of time all these "oopsies" become architectural spaghetti that can only be fixed with a complete rewrite, which will certainly become more expensive than getting the code "organically" developed.

That's so far been called software development.

All software developed by people suffers from this issue.

Where exactly is the novelty?

> The only way I see AI coding working in the long run is if we go back to a Waterfall/BDUF process and having actual engineering.

Nonsense. The problem is exactly the same.

With agents iterations are much faster, and this can mean things can get messier faster but can get in shape just as fast.

Ironically, agents improve the quality of the deliverable as well. Approaches such as spec-driven development do a far better job delivering features up to spec than manual coding by flesh and blood developers.

There's an awful lot of baseless scaremongering in your post. You make it sound like with AI assisted coding developers stopped paying any attention to quality.

> Where exactly is the novelty?

The compounding speed. Your devs might reach a point where they have to rewrite and refactor, in a decade.

Your LLM, with its higher throughput, may put you in that game breaking situation next week.

> The compounding speed. Your devs might reach a point where they have to rewrite and refactor, in a decade.

I think that this is exactly why this scaremongering breaks down. If you believe the compounding speed is that greater, wouldn't you be compelled to accept that refactoring and cleaning things up is just as fast and effortless?

I mean, you have a tool that writes software for you following your commands. If you are that concerned with maintainability then what can possibly compell you to not invest any effort in it?

> wouldn't you be compelled to accept that refactoring and cleaning things up is just as fast and effortless?

No. not at all. Imagine that each unit of work (a new PR for a feature, a bugfix) builds something that is 99% close to optimal and you can only get to bring it to 100% if you spend time to really review and rewrite the "not good" part. Also, for the sake of argument, let's just say that the overall quality of the system is geometric mean of the quality score of each unit of work. The only way to get an "ideal" system is by ensuring that work done on it follows the "ideal" architecture - for whatever "ideal" means for the developers/maintainers.

You are arguing that you are saving time because you only have to write the 1% that the AI got wrong, so you'd be getting a 100x speed up. My argument is that there is not so much time because if you want 100% quality, you will have to review 100% of the code. Understanding the produced code is the time-consuming part, not typing it out.

So, the only way to have these time savings by working with coding agents is if you accept that the code generated is good enough to not have careful review. But if you do that, then each unit of work that you tell yourself "not ideal but good enough. Ship it and we refactor later" ends up bringing the overall system quality. If you have 10 of these "99% good enough" PRs, and your overall system score is already at 90%. With 50 of these, the score dives down to 60%.

This is what OP and I are talking about "compounding" issues: unless we get to a point where generated code does not need review at all, your development speed will always be bottle-necked by the human in the loop. The only way to get speed benefits from the code generation is if we remove the human in the loop, but in doing so quality will drop faster than you can fix it.

> All software developed by people suffers from this issue.

And that’s pretty much where you are wrong. Take any long running open source project and you can see the craftsmanship that goes into it. It may not be perfect, but hacks are clearly marked as such.

> And that’s pretty much where you are wrong. Take any long running open source project and (...)

I think you are demonstrating a clear lack of insight and experience in software development settings, including FLOSS projects. I can name you a dozen of fairly known FLOSS projects which are a big ball of mud. Just go to the likes of GitHub, check out the list of popular projects, and peek at their code. You will get a very mixed set of results.

I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues. I’ve seen SOTA models make ridiculously stupid architectural decisions that they were then unable to back out of without being prompted very specifically, instead adding a patchwork of “fixes” on top.

I’m not saying that you can’t use AI to do it because I believe that with carefully controlled workflows and context management you can, but it’s not a simple prompt away, it’s requires guidance and understanding, and isn’t the speed demon that raw prompting is.

> I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues.

That's not really the point though. That presumes models are only useful if they are one-shot models. That is false.

I mean, what if your prompt successfully changes 20 source files and makes a mess in one? How much work did it saved?

And the elephant in the room is when models actually outperform whatever the prompter is able to deliver, and faster. That is somehow left out.

> That presumes models are only useful if they are one-shot models

That’s not at all what I’m saying.

I’m saying that in my experience across multiple models, the follow up prompts don’t fix prior underlying issues. They usually patch on top instead, unless you give them significant and time consuming guidance.

I want them to be more useful outside of one-shot uses, but I find that they currently miss the mark.

> I’m saying that in my experience across multiple models, the follow up prompts don’t fix prior underlying issues. They usually patch on top instead, unless you give them significant and time consuming guidance.

That's not my experience at all, and I have been using models that are far from being cutting edge. Even in the cases where a model generates utter nonsense, a couple of clarifying questions is all it takes to get it back on track.

But that might be a factor of the project being worked on, and the extension of the changes being asked.

I think this is overlooking the fact that assigning a coding assistant to fix the bugs it re-introduces for all eternity just leads to spiraling token costs, which might cost more than just hiring a competent engineer in the first place.
Maybe. We will see.

I think computers are incredibly cheap compared to humans. These models and infrastructure to run them are going to only get more efficient in time. Right now we are still using (for the most part) entire hardware architectures mostly shoehorned from one purpose (graphics) into another. As purpose-built hardware becomes more prevalent and the SOTA starts to slow down I can't imagine a $100k hardware box not being able to handle a small team of developer's needs for many things.

I do think there will be a place for the top 20% of software engineers forever. But most people are not in that top 20%, and the quality when you get below average is not a linear progression. It will not be that difficult for AI generated code to beat the "bottom end" of the industry since tbh it's hard for me to tell the difference between LLM generated code and some of the shit I've seen over the years. I've ran across code written by folks who don't know what an array is more than once.

Most software is not built by MIT and Stanford grads making $500k/yr in the Valley. It's built by work-a-day programmers in the middle of nowhere making $80k/yr to keep some niche small business going with hyper-specific software that was first designed for Windows 95. Or stuff like making horribly designed Wordpress plugins. Or Shopify integrations. etc. etc.

I've also seen these small businesses totally held back by incompetent programmers, and despite their best efforts and huge amounts (for them!) of investment they can never seem to fix it. These types of enterprises are having AI run circles around their current engineering practices, even if it would make most FAANG engineers gasp in horror.

Either way it will certainly be interesting to watch! I just wish I was closer to retirement.

This has been a debate for ever, long before LLMs. On the one hand you have people who don't care, on the other you have people who produce good code.

Doesn't matter how fast you can make the wrong thing.

Don't forget that you can adjust your requirements (either via plan or skill) to ensure the mistakes do not happen. The problem is that neither LLMs, nor humans (that don't work with the domain) will know they made these mistakes. Even coders don't think about everything all the time
> Don't forget that you can adjust your requirements (either via plan or skill) to ensure the mistakes do not happen.

No, you can't. Adjusting prompts ensures absolutely nothing.

I disagree. What I should have added is that with agents (as well as humans) you do need to have tests that verify what was done.
That assumes you can write automated tests that reliably identify the mistakes over an entire codebase. Nice idea in theory. If it were actually possible, we would long since have generalized libraries of tests to catch every significant security and performance gotcha. What we have are static code analysis tools, fuzzers, etc. None of which have come close to eliminating security and performance problems. I don't see how AI somehome changes that.
Ah, I see what you mean now. Yes, my mind went straight to static analysis and testing (unit, feature, uat, mutation). Thanks for expanding on your point!
In my experience, the refactors are just as bad, just in different ways. All you end up doing is treading water with different iterations of shitty code. By the time you get somewhere acceptable, you could've just fixed it up yourself.

My preferred workflow these days is to pair program with an LLM until it gets close-ish and then manually touch it up. Without that, it just produces junk in different forms.