Hacker News new | ask | show | jobs
by farhanhubble 21 days ago
I'm in the "haven't written any code in a while" boat ATM. I'd love to see examples of issues that are so big that they warrant reverting to manual coding.

My main issue has been the inconsistent quality across between model releases and the tendency to insert older APIs or documentation, especially with command line tools.

I can understand if the model struggles with a million line monolithic codebase with a decade of cruft but can't think of why it'd be too much of a pain with new codebases.

10 comments

> I'm in the "haven't written any code in a while" boat ATM

How long do you think it will be before you can't write any code because you're out of practice?

One of the dangers of engineering management is that it can turn you into a person that can no longer do the thing.

Does that even matter?

How long will it be until you can't spin the thread and fabricate your clothes stitch by stitch because you're out of practice?
Precisely my point.
That's fair. In all honesty I'm already feeling challenged but given how much time I save I can set aside some time to keep myself sharp. I can learn more languages. Additionally, as pointed out by others, I'm trading coding effort for design and and strategy, which generally control business outcomes a lot more.

Having said that, I won't use AI for production system if I don't understand the programming constructs in enough detail.

How many more languages have you learnt, and how much time have you spent keeping yourself sharp? 99% of your work time, right?
> In all honesty I'm already feeling challenged but given how much time I save

And how much is that?

Easily 99 percent on most tasks. As an example, for a Python project with a dozen modules and ~50 files,a simple instruction like "Design a config file backed by Pydantic to store the project's settings. Keep the models modular" sets up nested Pydantic models, moves the settings to sensibly named JSON fields and updates the code to use Pydantic classes everywhere. Takes a few minutes maybe. Manually done the same task would take me half a few hours in the best case and a day in the worst case.
AI: I would urge you to reconsider, this is a multi week project.

Me: Do it anyway

10 minutes later

AI: Perfect!

The ability to read code doesn’t decay at nearly the same rate. Neither does your experience.
I read plenty of books, but I'd struggle to write one because that's a different skill that I don't have.
I’m not entirely convinced that’s true. Is there evidence that someone well-read would also be a bad writer?
I review every diff the clanker makes.

After a few hours of this I still look at the codebase and think "wtf is this?".

I think writing the code is a very important part of understanding it. LLM driven development is like doing maintenance programming from day one.

I've used that as well, it's like starting with a legacy app every time.
When every prompt produces a thousand line PR, you’re not very far from another million line monolith.

I’m a little more hopeful than the author though. I feel like it’s possible to manage the process so that does not happen.

It's not difficult to avoid the 1000 lines per PR thing: Depending on what kind of thing I am adding, the plan might also receive as instructions to value making as small a code change as possible. It still requires judgement, as on something big, the smallest possible code base is not necessarily the most readable, but this is the kind of thing one can decide with some experience and little work.

I've also managed to use LLMs to cut a lot of manual duplication in code where we typically didn't do enough investment: "Claude, evaluate code duplication in the functional test suite" will have no problem finding things like insufficient helpers, or tests that are testing simpler things as prerequisites, so they can rely on each other. So I am not seeing my codebases growing all that much. There's some risks of functional changes that before would be rejected due to cost which now are not, but I am not all that sure of how much that is controllable without being relatively antagonistic with management.

Monolith is an ordinary form of software btw - big ball of mud is the problem.
> manage the process so that does not happen

This is the gold, right here.

It doesn't engineer. It writes code. Enthusiastically. Usually without thinking about the bigger picture, the design, the architecture, the trade-offs, etc.

It's up to us to manage that process.

It's why senior engineers are finding LLMs a really useful tool - because we've learned to think about all that other stuff before opening the text editor. Writing the actual code was always the easy (and least valuable) bit.

Here's one that hit the frontpage recently:

https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/

That's indeed a good example.

However I find their claim "I've lead teams of really competent engineers and I can leave them without supervision for months and come back and not feel like throwing away the entire code base." dubious. We all know how much effort it is to keep the quality of even small patches consistent.

Design, architecture, style and refactoring still require significant involvement. Providing only a description and a criteria will likely produce hopelessly messy code, which is also what you get with most corporate dev teams.

An electronic medical record system. It's got too many moving parts and they all have to unify.

I've had a lot of success in using LLMs in smaller tasks that I peal off myself, but it's largely due to having an existing architecture that makes sense. The times in which I've tried to let the agent loose to make architectural decisions, it tends to wildly-overcomplicate stuff (which I'm only able to recognize because it's a problem I suffer from too).

That's not to say those smaller tasks aren't useful or time-consuming. They're important, but I try to remove the critical assumptions that might need to be made.

Seems like AI isn't really solving complex bugs and issues (that it itself created) in my MAUI project over the last 18 months.

Seems like it is completely hopeless at doing anything netcode consistency and performance related in game dev. Seems like unique game mechanics it doesn't do well either.

Seems like asking it specific UI stylistic changes is basically like throwing darts at a board and hoping it sticks.

It isn't "so big." Things can also be "so small" that it's not worth it.

Coding isn't very hard, so it's often easier to just code than read and write English. I write Haskell exclusively though so this might bias me.

What type of projects you work on, in particular how rich it is in novelty, non-googlable data points and non-trivial project-specific deviations from industry standards?
Even if agents failed at that I'd wager that's a very small percentage of software projects anyway.
It's basically every game targetting consoles. Good luck finding any real infos about the Nintendo SDK API on Google.
As long as you don't make stuff that others have to rely on, you can live as dangerously as you want.
Even with relatively simple things, frontier models get me about 90% of the way - and this is without evaluating how good that 90% actually is. It's the last 10% that the model fucking sucks at. And it's often the simplest things. It takes a lot of tokens and a lot of time to cajole the AI to get that last 10% working. And even then, I've just given up and had to go read the slop and fix the bug myself because it become so frustrating.
> I'd love to see examples of issues that are so big that they warrant reverting to manual coding

Ah I see your org hasnt yet had an outage caused by a bad LLM code push.

This shouldn't actually change virtually anything. We had this happen recently, and were able to rollback within minutes. Devs hand-coding stuff breaks things too. If you already have good observability, fast rollback processes, and feature flag new changes plus do % based rollouts to limit the blast-radius, then it's more or less the same.
sounds like bad deployment practices - canaries, guardrails, fast rollbacks, ring based promotions, cell based architecture, blah blah etc... humans write bad code too, there should be systems in place to protect it from releasing
I think people spend way too much time trying to say that LLMs are bad / shouldn't be used / etc because the LLM can't get it right the first time and/or makes mistakes. I think this is because we all hope that software/computers work like this in an ideal world, and LLMs are software.

This is the wrong mental model.

The way to think about an LLM is like a human: prone to following bad examples if it sees them, needs guardrails to catch mistakes, needs code review. It also needs access to what "correct" looks like: architectural design documents, skills that explain each type of change, etc. It needs prompting/skills telling it to follow a safe workflow, telling it to consider how a safe rollout would work, what a safe rollback would look like, what the performance implications are - just like a human.

The nice thing is that you now have a very knowledgable assistant that can help write additional guardrails that would have always ended at the bottom of your backlog. Perhaps it used to take many hours to research and understand how to write a custom linter to catch a specific coding pattern. Today, ask Claude to do it and an hour later you'll have a custom linter rule for your language of choice, guaranteeing the same mistake can't happen again because CI will block it.

"Ah I see your org hasnt yet had an outage caused by a bad LLM code push"

"We went back to shovelling by hand because someone ran over the pole with the front-loader, even though he had no experience driving it."

This is definitely user error; obviously it's a hard tool to wrangle but it's entirely possible to use it safely.