Hacker News new | ask | show | jobs
by ryandrake 42 days ago
In my experience, Claude only knows how to spew code. Every problem you want it to solve, it translates into "more code" rather than "less code". You have to very closely code review everything it does, otherwise your codebase is going to just grow and grow, and asymptotically approach 100% debt.

I code review everything that Claude produces, and I'd estimate about 90-95% of the time, my reaction is WOW it works but too much code dude, let's take 3 hours to handhold you through simplifying it until nothing more can be removed.

18 comments

  > let's take 3 hours to handhold you through simplifying it until nothing more can be removed.
This is why I'm unconvinced that AI code makes me faster. Sure, I could produce a million lines an hour but are we running a sprint or a marathon? I don't know about you but I can't sprint a marathon.

I think much of the world of software has become incredibly myopic. I get it, it's a lot harder to win a war than it is to win a battle but just usually taking the easy way out is just deferring the costs to your future. Problem is that those costs accrue interest... Personally? I'm lazy and a cheapskate.

When did programmers stop becoming lazy and start becoming lazy? More importantly, why?

> I think much of the world of software has become incredibly myopic. > usually taking the easy way out is just deferring the costs to your future. Problem is that those costs accrue interest.

This sums up my thoughts perfectly lately, that is a great way to put it all.

Programmers have never been any good at measuring or estimating their own productivity, there is no reason to assume that has changed (one could argue theres ample reason to assume the opposite).

Part of the problem as well is that there is some unseen/unnamable "spaghettiness"/"sloppiness"/"whatever" factor, that scales very very poorly. At the beginning it can seem fine, especially when you have some constant speed multiplier like an LLM spitting out code - but the larger exponent of the function that results from this factor being "worse" will eventually outpace that constant multiplier. You will only see it once its too late, or will never see it all because of our myopia as you say.

Same. Luckily I enjoy the process of refactoring and deleting code is nearly arousing, so I get the initial dopamine rush of wow this works, followed by the dopamine rush of "wow now this is cleaner and works so much better". Keeps me in touch with the codebase too.
Pruning code is to software engineers what cancelling plans is to introverts :)

I think I need to work up a Claude skill named marie-kondo, so that when it breathlessly presents its triumphant solution, I can go “yes, but does it spark joy?” And have it go into an aggressive refactor loop with me.

Sounded like fun so had Claude do one up here: https://github.com/fragmede/marie-kondo-ai-skill
Hypothetical future callers, "for extensibility" abstractions, single-use helpers, ceremonial try/except blocks, and options dicts with one key all get culled.

But this is never the problem. Claude WILL NOT abstract and WILL NOT use your abstractions. It finds them all “ceremonial” and the idea that you could add something that might seem indirect that actually dramatically reduces the problem space is almost impossible to convey.

You can watch this in action for any API whose design you’re familiar with in a domain you understand well. If you attempt to design the same API with Claude, your will invariably get a mess of flat, insane types and no reuse. I’m talking an array of tuples of maps of set to map type insanity.

What has been helping is a mandatory pass of “Claudisms”, but even then it can only find the problem and never the solution.

It is so frustrating.

I question any dev who doesn't get aroused by deleting code.

I just removed an entire graphql endpoint - 500 lines of front and back-end code. I may need to be hosed down.

`$JOB` recently introduced the `#red-diffs` Slack channel. I just submitted ` +4 / -28,742`. Pretty proud.
It's so beautiful.
Get a room!
Oh my. I may not want to know what selecting all and then pressing Delete would do to you.
At this point, it's worth asking whether lots of relatively straightforward verbose code is actually significantly worse than the least code necessary for the problem. Obviously, architecture matters. What might matter less is verbosity.

The reason we aimed for minimal "accidental complexity" up to now was directly related to the cost/pain of changing and maintaining that code. Hasn't the economics of maintenance and change shifted so much that accidental complexity isn't actually all that expensive/painful?

I think a bit of refactoring, renaming and restructuring has been helpful for maintainability but recently I've been a little less inclined to worry about the easy readability of function bodies and fine implementation details. It still feels wrong but I can't justify the effort anymore.

>Isn't the economics of maintenance and change shifted so much that accidental complexity isn't actually all that expensive/painful?

Not while context windows cause decay and larger bills.

The AI's max cognitive load C is larger than a human's, but if codebase size grows unbounded the minimum context needed for a change will eventually surpass C.

It is also a bad idea to let your codebase become only readable by a machine when we are still in the dark about the role machines and people will take in the future. What if you have to go back to manual dev in a now gargantuan codebase?

I'm been in a community that makes a lot of cognitive training software. There's some core open source projects that were created without LLMs, but new projects are now mostly created by young people vibe-coding from scratch or forking and modifying the existing projects with an LLM.

The answer to your question is really obvious. The high-effort manually coded projects stick around and the low-effort vibe-coded projects are forgotten about quickly. In the end LLM-driven programming is always going to bring you to a dead-end. There's certain things where I can predict that they're going to fail because it's going to involve certain kinds of complexity they can't and will never be able to deal with. The code gets so bad that even if an expert programmer wanted to make changes it either wouldn't be possible or worth it. A lot of the time the vibecoders are so high off the low-effort sense of empowerment that they don't even realize what they made is completely broken.

Well written software has staying power because it can be understood and built upon. Understanding a problem deeply enough to devise an elegant solution even leads to new possibilities and ideas that will never be conceived with a more superficial understanding.

> Hasn't the economics of maintenance and change shifted so much that accidental complexity isn't actually all that expensive/painful?

I sincerely believe that extensive accidental complexity will ALSO be bad for AI agents. Their quality will diminish as their context windows get filled up with endless amounts of spaghetti and accidental complexity. I feel like we won't fully start feeling those effects for another year or so.

True, yet they have a Moore's Law like growth going for properties like their context windows.. I think the larger problem with letting them be verbose is Occam's razor. The more verbose they are the more variant behavior they will have where any variation that is not strictly necessary is likely to include incorrect behavior.
Attention is an O(n^2) algorithm. Combined with Moore's doubling, it will at best produce linear growth (assuming Moore's law is still remotely close to alive)
I don't think many developers like software bloat or what it has meant for our professional reputation but we would be dishonest if we predicted a future without outcomes where ugly brute force wins given all the constraints. It is not Moore's law that dictates context window that is only an analogy and so far it has been exponential growth that went from well bellow humans to more than a human can deal with in terms of short term tasks.
A problem I’ve found is that when you’re adding functionality or refactoring it often leaves unused methods or types behind, at least with multiple devs working on the same codebase.

This unused code gets further modified as time goes on: new functionality is wired in, or it gets further refactored. Usually it’ll still have tests that cover it. It gives the impression of being live code, but it’s not: it’s zombified.

So you get situations where it gets wired up to something and then that something doesn’t work and you wonder why and so you start digging about and you discover it’s because it has been wired into a path that is never executed.

The fog of relatively recent changes sometimes makes it hard to figure out if the code should be unused or if someone just forgot to hook it in as part of a bigger piece of work. Then you find nobody else is really sure either.

So that extra complexity comes at a cost. It can slow you down or trip you up; catch you by surprise.

  > it's worth asking whether lots of relatively straightforward verbose code is actually significantly worse than the least code necessary for the problem.
The question is wrong because reality isn't binary. "We've" never aimed for minimal, except maybe in the very early days or some real edge cases

If you're writing the minimal code you're either writing something very compact/simple[0], or you're wasting too much time and not balancing things.

If you're rewriting everything then you're wasting too much time and introducing too much complexity[1].

You can't write good code by slapping together a bunch of libraries but that doesn't mean you shouldn't use libraries either.

[0] "simple" is an overloaded term. If you're upset by me saying "simple", I'm using the other definition

[1] sed -i [0] "s/simple/complex/g"

I don't think people are talking about the least code possible, just not incredibly verbose and inefficient like what you get by default from llms.

For example I have a game I've been working on for a few years, I do stuff like "implement this simple psuedo physics system to make the bot follow the character like so...etc"

After some planning and back and forth.

It returns mostly working code a little odd on some edge case.

But as I've hand coded this thing for years. I could easily look at it. Laugh my ass off, it had multiple classes and around 1k lines of code, all kinds of crazy non performant crap.

The exact thing I needed, I reprogrammed in around 5 lines of very simple code that did exactly what I needed with no edge case weirdness.

Now the vibe coders actually ship that shit. I like to read vibe code games now and again, and there is no possible way those guys are ever shipping a real game, as every single decision is verbose along with the worst performance decisions over and over everywhere.

Sure it can get you some cute little toy projects, but it will absolutely fall apart if you are trying to make real games.

Don't know about saas apps or whatever. Maybe that stuff doesn't matter at all.

With SaaS apps, I've found you either have to hand write a framework for it to use, or put an even greater amount of effort into double-checking and correcting it. Then you can point it at bugs and features, get it to write tests for you, and so on. If the code's too wordy, who cares? Keep the blast radius to self-contained modules and the AI can't mess up too badly. Whenever you abstract something or the work is critical, you need to go back to hand writing everything.

Abstractions are like the structural elements of a house, security is like plumbing or electrical, but individual features are like carpet and paint. When it's working on the superficial stuff, who cares what it gets wrong? Just go rip up the carpet and do it again if you have to.

> Hasn't the economics of maintenance and change shifted so much that accidental complexity isn't actually all that expensive/painful?

They have, but not in the way you mean.

AI knows nothing about software engineering. AI is a technical debt generator.

You can mitigate this somewhat if you put an actual software engineer at the helm with lots of prompting, but at some point the technical debt accrues enough that neither humans nor AI can fix anything.

As an example, this is what happened to OpenClaw. (And why you suddenly stopped reading hype about it.) OpenAI paid millions for literal trash.

A particularly pronounced version of this can often be seen by letting 2 agents review and code in a loop. One agent will find some problems with the code, the other agent will address the review by adding more code.

A good human developer might see that the better way to address the review is to backtrack and pick a different approach. The ai agents seem more prone to getting stuck down bad branches of the decision tree.

Of course it writes a lot of code. It gets paid per token. That's guaranteed future income every additional line of technical debt.
>> Of course it writes a lot of code. It gets paid per token.

I don't buy it. I think a much more likely reason it leans towards adding code is because deleting code carries inherent risk: it can break things in major ways or minor ways or very visibly or invisibly. Adding new code, on the other hand, is a lot safer: the only parts that can break are those the AI touched inside its own working context. So it doesn't have to go down rabbit holes and potentially create bigger and bigger messes.

Periodically you can also ask it to review the recent changes and see if there is a risk-free way to streamline them.

You can also tell it to periodically summarize the "lessons learned" from the recent session(s)

At some point they’ll introduce “deletion” tokens that cost ten times the regular token price. ;)
Then local models shouldn't suffer from the same problems, but they do. They just aren't trained in the direction of "less code == better long-term maintainability" I'd say, rather than some grand "increased-token-usage" conspiracy.

You can certainly steer them a bit to reduce the issue parent talks about, but they still go into that direction whenever they can, adding stuff on top of stuff, piling hacks/shim on top of other hacks/shims, just like many human developers :)

Training data is the masses of code from everyone.

Restrict that data to just the best of the best, the tersest of the tersest, and we’d see better output. I don’t think people are sharing that kinda stuff (Jane Street’s gems stay locked up), and even if they did my presumption is that it’d be too narrow and demanding for general audiences.

Big hopes for the long future, damned to some degree of mediocrity in the near term mass product.

This was a large part of my problem with Claude code, it is far too eager to get to the code writing. Matt Pocock's skills and Codex I have found to work together quite well. You still have to ensure design/architecture is being followed, and review carefully obviously, but Codex by default seems to look for minimal change approach a lot more than Claude does/ever did.
Here's what I do

Tell it "Do not change any files yet, just listen." Then we discuss the problem. Then I have it write to a file it's understanding of the change.

I review that carefully. Then I let it implement. I approve each change after manually looking at it. I already know what it should be doing.

Make smaller changes and check each one carefully before and after.

This is a reasonable approach but has nothing to do with what is being pushed on us from all sides.
I think this is more a by product of the way these models are architected. “One more token” i usually much more likely than a “STOP”. Knowing when to stop and doing more with less is something also very hard for human developers.

For me what throws me off most of the time is the structure on the mid-level. It usually makes sense in the loc and maybe project level, but on the file and folder level it just loses reference on what it already has or what it does not need to be too verbose about.

> I think this is more a by product of the way these models are architected. “One more token” i usually much more likely than a “STOP”.

That’s not really how it works. An agent wouldn’t get halfway through _any_ implementation and just stop abruptly - it’s not as simple as rolling the dice until you land on “stop”.

It will stop when it believes, for whatever reason, the output has achieved whatever task was laid out. You’re welcome to refine what the definition of that task may be, or you can let it go off.

Hey that’s my exact experience. I started coding the interfaces by hand which helps with the architecture but you still have to say, “don’t add a bunch of helpers and stuff, stick to filling in the stubs.”

Then I only have to spend one hour handholding the clanker to get it perfect. I usually do a lot of manual refactoring as well during that time.

You can also tell it to specifically focus on removing unnecessary code as a pass, and it does that pretty well.
I do one of these occasionally and it usually finds redundancies and/or inconsistencies to clean up. It's very effective and should be part of any process involving agentic coding.
I haven't used Claude, just Sweep, Copilot and whatever Jetbrains has. But they've definitely deleted code, not just added it. I know, because they have deleted code that I definitely still needed, and I had to reject those changes and start over on the prompt.
> Every problem you want it to solve, it translates into "more code" rather than "less code"

Try Deepseek or Xiaomi's Mimo. They produce very lean code.

Multiple rounds of specialized sub agent peer reviews that are prompted to counter this works really well. Once you have the right pipeline in place you should not be needing to do the simplifying yourself very often.

Claude -> specialized sub agents peer review -> specialized sub agents peer review -> repeat as many times as needed

It’s not worth your time until you’ve run through such a pipeline.

It really does want to make everything overcomplicated.

I end most of my pre-plan prompts with "KISS - Keep it simple" to keep it mostly under control.

I also keep each file under 1000 lines and do a full scan of code and docs for cruft every 20-30 task cycles.

Been working on the same project for six montha and glad to say there is minimal bloat.

I'm curious how much you have tuned your CLAUDE.md file. You can get very specific and direct about what your expectations/desires are. You can also have another agent do a critical review with your expectations/desires and feed that back.
Just be careful to not put too much in there or it won't have enough attention left over for the tasks.

Look at the doc hub pattern if your {agent}.md file is getting more than ~100 lines.

>asymptotically approach 100% debt

What to do if you're just one dev in an org of 50? Who are all pushing more and more code every PR? I'm gonna have to leave aren't I :(

Pretty much, unfortunately. I was recently told at my job that I need to stop being so critical of AI, and encouraged to at least pretend to use it. Never mind that I've pointed out that my disuse is not because of being stuck in my ways, but because it slows me down rather than speeds me up. I guess management wants AI use no matter whether it pays off. It's clear that my options are to embrace the clanker, or find a new job.
A lot of people seem to think if you give the agent a framework and clear plans that it spews "good" code. I doubt it though.
Try codex.