Hacker News new | ask | show | jobs
by xvinci 3 days ago
Not my observation. If you never look at the code and dont have basic guardrails in place (linters, architecture tests, some guidelines for best practices) - probably.

But as soon as you do minimal reviews and high-level corrections, applications turn out just fine.

Can there be bugs? Sure. That's the price of not reading or understanding every line. It should depend on the criticality of your software how much of these you tolerate and how much you don't (reviewing, understanding, testing everything 100% like you were used to if you had written it yourself will kill most if not all of your gained speed)

But I never got the impression of unmaintainability or unfixable bugs.

Actually the other side around: A really good cleanup pass, architectural changes, or bugfixes are seldom more than a few prompts and 2 hours away, provided your overall base is decent and you actually gave a fuck from the start.

2 comments

> Can there be bugs? Sure. That's the price of not reading or understanding every line.

I've yet to come across a human developer who's output would meet this standard, despite writing every line.

In fact, having an LLM review our code is catching quite a few bugs before it reaches QA.

Indeed, though I find the distribution is different.

The humans may skip unit tests and need reminding; the AI always write unit tests once it's in AGENTS.md or whatever, but my experience* was that 5-10% of the time the LLM's attempt at a "test" would, instead of executing the code and examining the results, open the source code as a text file and run a regex to find/exclude certain substrings.

* At the start of this year, because Anthropic and OpenAI were both offering free trials. IDK how much things have changed since then, some things change fast in this domain, other things don't.

I’ve been piloting LLMs for the past six months non stop and we’re at the point where formally verified models generated as an intermediate step between spec and code are very good value.

Riding the exponential means you have to update priors more often.

I have seen some pre-AI over-mocked codebases where the "tests" where essentially that (but harder to read than regex would have been)
What I'm hearing is "thoroughly reviewing AI generated code would defeat the purpose, so we give it a cursory glance and it seems to be decent code", and that's my point - it does indeed seem to be decent code but I think we're all kicking the can down the road when we operate this way. If the alternative means there's no gain to be had by using LLMs to write code, so be it. Maybe that's the answer. Maybe we shouldn't be relying so much on AI to write our code.

I think LLMs are great for writing small snippets of code that really only have one "best answer" (something simple like writing an array to a CSV), and internal tools, where bugs and security vulnerabilities usually aren't a big deal.