Hacker News new | ask | show | jobs
by daishi55 44 days ago
The generated code is more than fine, it’s good in many cases. And I read it :)

Indeed for the task of “jump into an unfamiliar codebase and make a requested change that aligns with existing styles and patterns, and uses existing functionality” I would say something like opus 4.7 exceeds the capabilities of most developers.

1 comments

I agree with both statements, but that doesn't change the problem I stated. If an agent produces reasonable code 80-90% of the time, and 10-20% of the time it makes mistakes that could render the codebase irretrievably unevolvable once they accumulate, the only thing you can do is to carefully review the agent's output 100% of the time. That it gets things right 80% of the time as opposed to 40% of the time doesn't change this calculus one iota.

But agents generate code much faster, and to know slow them down, some people want to not do the only thing that can currently ensure you get good results, which is to carefully review the output. Once that happens, there is simply no way for them to know how good or bad what they're getting is.

I guess I don't understand how this logic doesn't apply to human developers.
Human developers don't produce code at such a rate, and their judgment is, on average, better. So one, the review doesn't make you feel like you're slowing things down much, and two, the problems are less hidden.
> their judgment is, on average, better

I can only presume you work with talented people somewhere that is not representative of most companies. You're definitely overestimating the average programmer's abilities.

Well, the AI's judgment (i.e. if you accept it) leads to a codebase that cannot handle evolution for more than 18-24 months or thereabouts. If you bother to look you can literally see it rotting at 5x speed (all while passing all tests, especially the ones it writes, right up until the point it collapses and cannot be saved). Since most software codebases last longer, whoever is in charge of the judgment - be they average or not - is obviously doing a far better job than today's LLMs.
I don't agree and in my experience the rot happens way faster in handcrafted codebases with constant requirement ratcheting. You resort to shortcuts and code duplication to avoid breaking existing things. This is just the reality when you work under stress in a growing company. AI is much better at keeping up without deteriorating it.
More code != better. Don't believe me? Which is more valuable, MS Word or all the code written as part of class projects?
And humans produce 100% reasonable code or what? The kind of mess me and everyone I've worked with produces by hand is the inverse of that. Constant shortcuts and lazy slop through and through. Never worked anywhere where the code wasn't an entangled disarray.

As soon as requirements change the abstractions fall apart and everything gets shoehorned.

> And humans produce 100% reasonable code or what?

Humans can be held accountable for their own slop

> The kind of mess me and everyone I've worked with produces by hand is the inverse of that

Yes, it's frustrating to work with isn't it? So why are you so excited to make higher volumes of this low quality slop using AI?

I hear you. You're frustrated. Some people are having success with it. What have you tried to do with it?

> Humans can be held accountable for their own slop

If you're a human and AI writes code for you you are ultimately responsible.

The only people having success with LLMs right now are people who don't actually care about quality. Anyone who cares about producing good work recognized a long time ago that LLMs are not fit for purpose, and isn't relying on them.
Funny how you spend your days spreading this nonsense, like if someone would deny reality just because you keep repeating it. Everyone knows that what you're saying isn't true, so you're wasting your time.
No, humans don't produce 100% reasonable code, but the nature of human mistakes, foibles and unreasonableness is very different from the kind slop farming yields.