Hacker News new | ask | show | jobs
by lukan 36 days ago
How do you define "bad code"?

If I instruct the AI to make small modules where I can verify they work, have tests and no side effects - then it is good enough code for me. It works, is readable and can be extended - and will turn into bad code if this is not done with care.

4 comments

Sure, if you carefully review the agent's output, including tests, you can get good results. If you don't carefully review the output, you obviously have no idea if it's good enough for you. The only way to find out is that 30 changes down the line the agent won't be able to change one thing without breaking another, but by then the codebase will be too far gone to fix.
This is essentially true. There are other ways to achieve this goal though, that don’t require exhaustive human review, better models are able to do that part as well if properly guided. The key is that yes, some of the design constraints will morph over time, necessarily, since coding is as often about discovering the problem as solving it. But design principles don’t drift. If you have a design principle that can not be adhered to, it is not a proper principle, it’s an opinion about the problem.

The main thing that helps me in my workflow is to develop documentation around the code. If the code drifts from the docs, the model will notice and you can decide which was correct, the plan, the maintainer manual, or the code, or the comments in the code. Notice that there is 3 separate things written about the code, and the code itself…. Keeping all of that correct, coherent, and consistent (with a separate, invariant document that describes your design principles) keeps the model from going off the rails and gives ample opportunity to sense bad smells before they get set in stone.

It’s a token fire and you need a minimum 250k context model… but I still get as much work done in an hour as I used to do in a day, and the code I coauthor is better documented, more maintainable, and more tested than any code I have ever written before.

> There are other ways to achieve this goal though, that don’t require exhaustive human review, better models are able to do that part as well if properly guided.

Not at this time. Even if you could somehow get their success rate to 90%, it's still far too low because the mistakes can be (and are occassionally) catastrophic. It's only when you review everything that you find mistakes that will bite you down the line. If you don't review everything, you just don't know, but the rate of bad mistakes introduced by the agents is too high to trust, no matter how much prompting and orchestration you do. Maybe future models will address that, but we're not there yet.

> The main thing that helps me in my workflow is to develop documentation around the code. If the code drifts from the docs, the model will notice and you can decide which was correct, the plan, the maintainer manual, or the code, or the comments in the code.

That's helpful but it doesn't solve the problem, which is that the agents are happy to introduce horrendous workarounds, and they don't tell you that the code they've written is a horrendous workaround. The docs are fine and reflect the code and the code reflects the strategy, but you just don't know that the strategy is wrong.

I haven’t had this problem. Maybe it’s because of the language I’m using (C++) or maybe it’s because of the strict enforcement of modularity and public vs private interfaces, etc that I use? Also, the code is tested against the hardware with every change. Idk if that’s why my experience has been different from yours or not.

My workflow also requires a discussion of the architecture and methodology of each addition or change, but honestly because we define the interfaces first, and each concern is given its own .c and .h file, it’s very hard to sneak something in without me noticing and calling it out. (Which does happen occasionally)

I suspect that file level granularity may be one of the keys. It never is actually working on more than a couple hundred lines of code at a time, plus interfaces of related files. I end up with a hundred files where I might have had 30 coding by hand, but it is actually easier to reason about the code for me as well, and the number of files is not an issue because of the automation. Total LOC is about the same as I would produce by hand for the same work, which means it’s actually writing less, due to the interface overhead, so I’m pretty stoked about that. The only real nightmare for humans is the long includes.

OTOH if I don’t do all of this it will definitely go off the rails and produce garbage.

I’ve been writing c (and c++) for almost 40 years, and although that doesn’t mean I’m any good, it does mean I have developed a keen sense of smell and highly sensitive olfactory PTSD.

With the right structured environment, a SOTA model with a suspicious seasoned dev holding its hand can be easier to manage and much more productive than a small team. Or, maybe I’ve just sucked so bad my whole life that I can’t tell the difference, but at any rate it works well enough to ship without nightmares, and less bugs and patching than I had before.

Edit:

I should mention that if bugs get tricky, like hardware idiosyncrasies and things like that, the model just goes nuts.if I handle it very very carefully so that it does not try to understand the problem, and I just have it poke the firmware with a stick from a distance enough times and from enough angles, as long as I have successfully prevented it from trying to figure out the problem (which is not as easy as it seems like it would be) it actually will usually nail it. If it starts to guess it’s usually best just to roll back the context and start over with the poking (I have a harness so it does direct hardware probes)

There seems to be an analog for this for non hardware related issues, but it’s harder to sus out when you should be telling it that you specifically do not want it to attempt to understand or solve the problem until you’ve rigged and tested all of the debug messaging.

I don't think our experience is different. Letting the agent work on pieces no bigger than a couple hundred lines at a time and checking if there's something fishy or not and that the code is legible and logical is close human supervision. This is very much not what the people who wish AI could build products for them do or can do at the rate they're moving.
Lol I guess you’ve got a point , but honestly it’s not more supervision than I would give a junior dev, at least until they had developed at least a few months track record of good judgement.

I guess the problem is the blind assumption of competence?

I just think of AI as being a lot like my late friend Henry. Henry had several PHDs, was an accomplished polymath in a bunch of other subjects, and spoke more than 20 languages with reasonable fluency. He was for sure one of the smartest people I ever met.

He was also prone to drinking, and he when he was on a tear, you could barely tell except he would confidently say some of the most outrageous shit, or start speaking some other language without noticing. So you always took Henry with a grain of salt, and if it was important you’d double check. Even so, he was still an amazing resource to bounce things off of.

I get what you mean, but that can also happen with code written by humans.
Nobody is going to argue that humans are capable of writing bad code
Sure, by inexperienced ones.
Experienced developers write bad code all the time.
30 years of experience writing bad code, with no effort to improve, doesn't make you any good. You need to right attitude and humility to become good.

Some of the worst programmers I have ever worked with had 30+ years of experience. They basically spend all of their time fixing bug after bug in a never ending cycle because the software they produced was so fragile that it would crash if you just looked at it wrong or the temperature in the room wasn't perfect.

While others with the same number of years of experience had massive systems in production for years with not a single bug reported by the happy users.

Hm. Some have rather a lot of experience of making such mess themself.

I mean for real, is the idea here, that all programmers are or were some kind of semi gods?

Because this is not what I remember from the pre LLM time, rather this:

https://xkcd.com/2030/

I know I got into such developement hell myself. Fix a bug here, results in braking something there. Experience surely helps in avoiding it .. but even senior devs can make a mess. Otherwise there wouldn't be so many projects canceled.

So sure, agents can multiply a mess in a amazingly short time, but .. that is up to the humans guiding them.

That is correct. Using an AI to generate code and then not verify it yourself is IMHO unprofessional and should get you at a minimum a verbal warning. YOU are responsible for the code NOT the AI.
I let agents break things 30 changes down the line. If something breaks, I add a check to my project validator and start over, with the validator providing instructions on what was wrong and how to fix it. It's all automatic, and now I have a guard against the exact same error in the future.

Some of these checks have caught thousands of the same error, even with the latest Opus 4.7 writing the original code.

You proved that testing is a good idea, not that vibe coding is a good idea.
To be honest, I am past the point of wanting to convince people that AI is useful, if you want to refuse new tools other people find helpful, your loss.

(Also I stick to the original definition of "vibe coding = not looking at generated code", "LLM assisted coding = verify generated code", I do both, depending on the task)

So basically your only test is "it compiles" since you have no idea what it's actually testing.
How do you think the tests were generated?

You don't actually think I look at the code, do you?

Down the line the agent is no longer able to fix one failure without causing another and the codebase is unsalvageable, but you may not have reached that point yet.

Agents can help a lot when you carefully review everything they output and find all the time bombs they like hiding in your code and your tests. If not, then they're fine for codebases that don't need to last more than a year or two.

The concept of a small module is an architecture invariant. You’re making that decision, not the LLM. And you’ve made that decision because the machine is not good at certain things. You’re doing that because you can’t trust the LLM to make that decision on its own.
I’m doing it because as a DDD adherent, I’ve been building software that way for 15 years without GenAI and now with GenAI I can do it faster.

You can’t play whack-a-mole with GenAI. You have to start from well-known principles and watch everything it produces. Every module or bounded context has to have its own invariants.

You can’t fully automate software engineering with GenAI. It seems the vast majority of GenAI users think they can and end up in the same place as the OP.

Maybe learn Domain-Driven Design, Event Sourcing, and then try again. The results will be dramatically improved.

https://devarch.ai/

Love the DDD callout. I have explicit steps to review and rate delta's to the ubiquitous language and one of my architectural reviewers will often engage with me about where the bounded contexts should be and will probably the translation layers.

I find the more good practices I add to my envision/scope/spec/build/test/deploy loops the happier I am with the outcomes.

I will say that I am finding the actual code to be somewhat ephemeral for me - the more precise the specifications are and generally the tighter and more elegant the design is, the less the code matters as a long term artifact.

I'm not at the "code is assembler" point yet - but I could see that with more, richer specs I could end up there. Of course the specs are then substantial, but declarative specs can be robust and unambigous (with sufficient read teaming review) and - like domain specific languages - reduce the accidental complexity of the syntax when compared to an implementation in a given language.

There are exceptions to all of this, but it's fascinating to see how it's evolving!

> How do you define "bad code"?

The harder the code is to understand, the badder it is (and the more likely it is infested with bugs).

> How do you define "bad code"?

Code that will not be able to evolve for more than one-two years is terrible code. Agents write terrible code while doing a truly impressive job hiding it (including in the tests they write) unless, of course, you keep them under very close supervision.