Hacker News new | ask | show | jobs
by p1necone 298 days ago
If you're using llms to shit out large swathes of unreviewed code you're doing it wrong and your project is indeed doomed to become unmaintainable the minute it goes down a wrong path architecturally, or you get a bug with complex causes or whatever.

Where llms excel is in situations like:

* I have <special snowflake pile of existing data structures> that I want to apply <well known algorithm> to - bam, half a days work done in 2 minutes.

* I want to set up test data and the bones of unit tests for <complicated thing with lots of dependencies> - bam, half a days work done in 2 minutes (note I said to use the llms for a starting point - don't generate your actual test cases with it, at least not without very careful review - I've seen a lot of really dumb ai generated unit tests).

* I want a visual web editor for <special snowflake pile of existing data structures> that saves to an sqlite db and has a separate backend api, bam 3 days work done in 2 minutes.

* I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.

You don't need to solve hard problems to massively increase your productivity with llms, you just need to shave yaks. Even when it's not a time save, it still lets you focus mental effort on interesting problems rather than burning out on endless chores.

12 comments

> * I want to apply some repetitive change across a large codebase that's just too complicated for a clever regex, bam work you literally would have never bothered to do before done in 2 minutes.

You would naively think that, as did I, but I've tested it against several big name models and they are all eventually "lazy", sometimes make unrelated changes, and worse as the context fills up.

On a small toy example they will do it flawlessly, but as you scale up to more and more code that requires repetitive changes the errors compound.

Agentic loops help the situation, but now you aren't getting it done in 2 minutes because you have to review to find out it wasn't done and then tell it to do it again N times until it's done.

Having the LLM write a program to make the changes is much more reliable.

> Having the LLM write a program to make the changes is much more reliable.

I ended up doing this when switching our 50k-LOC codebase to pnpm workspaces, and it was such a good experience. It still took me a day or two of moulding that script to get it to handle the dozens of edge cases, but it would have taken me far longer to split things up by hand.

I still feel like I am under-using the ability of LLMs to spit out custom scripts to handle one-off use-cases.

That’s not even a very large code base. My experience is definitely that anything with more than 100K-loc really makes the LLMs struggle.
there is more to it than that. it's about modularization as well.

I run LLMs against a 500k LoC poker engine and they do well because the engine is modularized into many small parts with a focus on good naming schemes and DRY.

If it doesn't require a lot of context for an LLM to figure out how to direct effort then the codebase size is irrelevant -- what becomes relevant in those scenarios is module size and the amount of modules implicated with any change or problem-solving. The LLM codebase 'navigation' becomes near-free with good naming and structure. If you code in a style that allows an LLM to navigate the codebase via just an `ls` output it can handle things deftly.

The LLMification of things has definitely made me embrace the concept of program-as-plugin-loader more-so than ever before.

The app I work on is fairly highly modular, to the point that we split the app in half and unwinding the two halves of the code only took about 2 weeks.

> The LLM codebase 'navigation' becomes near-free with good naming and structure

I have not found this to be true. They seem to break badly if you have a lot of files with similar-ish names even if they're descriptive.

This has the side benefit of likely being easier to navigate for humans too. The less I need to keep in my head to figure something out the better.
Yeah was thinking about this recently. A semantic patch is more reliable, but prompting an ai might be easier. So why not prompt the ai to wrote the semantic patch.
"bam work you literally would have never bothered to do before done in 2 minutes."

And I would never want to use a piece of software written by you ever.

If you think that writing the code was the hard part, your code was probably always shite.

Yeah well you definitely already do and don't know it so please spare us the pearl clutching.
Care to elaborate?

I'm sure there's a lot of poorly tested code I use, that doesn't mean I want to use it. :)

In fact, I see a lot of broken things in the wild, frequently.

I like the spirit of these, but there are waaaay more. Like you only mentioned the ones for professional and skilled coders who have another option. What about all the sub-examples for people all the way from "technically unskilled" to "baby-step coders". There's a bunch of things they can now just do and get in front of ppl without us.

Going from "thing in my head that I need to pay someone $100/h to try" to "thing a user can literally use in 3 minutes that will make that hypothetical-but-nonexistent $100/h person cry"... like there is way more texture of roles in that territory than your punchy comment gives credit. No one cares is it's maintainable if they now know what's possible, and that matters 1000x more than future maintenance concerns. People spend years working up to this step that someone can now simply jank out* in 3 minutes.

* to jank out. verb. 1. to crank out via vibe-coding, in the sense of productive output.

The fact that people could make excel monstrosities has never really been a real threat to the job security of programmers. IMO it increased it. LLMs are the new excel
Agreed. Video game idea that’s been in my head for years, but not sure if it’s actually fun? Too lazy to sit down for a few days and make it. Went back and forth with an llm for 30 mins and I had more of a game than was even in my head.
*"to vibe out"
I agree with your comment but I wanted to share something that gave me a good chuckle the other day. I had asked claude to write some unit tests that, after reviewing, were sound and actually uncovered a bug in the code-under-test that I had written. When I pointed this out, claude had decided that to make the unit test pass, it would not patch the bug but it would simply not exercise the failing unit test LOL! Good times.

But yeah, LLM are not good at defining requirements, architectures, or writing a spec to the requirements. They are good a contained, bite sized asks that don't have much implications outside the code it writes.

It has a strong preference to only change pieces of code you asked it to touch.
There's also a middle ground, where you have the AI generate PR reviews and then review them manually. So that 2 minutes of code you spat out (really more like 5-10 using CC) takes another hour or three to review, and maybe 5 to 10 more commits before it's merged in.

I've done this successfully on multiple projects in the 10-20k LOC, ~100 file area - fully LLM generated w/ tons of my personal feedback - and it works just fine. 4/5 features I implement it gets pretty close to nailing from the spec I provide and the work is largely refactoring. But the times it doesn't get it right, it is a slog that could eat the better part of a day. On the whole though it probably is a 3-5x speedup.

I'm a little less confident about doing this on projects that are much bigger... then breaking things up into modules begins to look more attractive.

It's definitely a middle ground, but PR reviews, are not perfect. So it's easy to miss a lot of things and to have a lot of extra baggage. From reviewing code it's not always easy to tell exactly what's necessary or duplicate. So I agree, this is a middle ground of using LLMs to be more productive. Removing one bad line of code is worth adding a hundred good lines of code.
> If you're using llms to shit out large swathes of unreviewed code you're doing it wrong

> bam, x days work done in 2 minutes

This is a bit of a misrepresentation, since those two minutes don’t account for the reviewing time needed (nor prorperly, which vastly exceeds that time. Otherwise you end up in the situation of “doing it wrong” described in your first paragraph.

Most of these cases don't require "review". It either works or it doesn't.

If you have an LLM transform a big pile of structs, you plug them into your program and it will either compile or it won't.

All programmers write countless one-off throwaway scripts. I can't tell you how many times I've written scripts to generate boring boilerplate code.

How many hours do you spend reviewing such tools and their output? I'll bet anything it's just about zero.

What do you mean "reviewing" throwaway tools and scripts? If you wrote them yourself, presumably you understand what they do?

I've also spent countless hours debugging throwaway scripts I wrote myself and which don't work exactly like I intended when I try them on test data.

Working in aerospace, code generation tools are indeed reviewed pretty thoroughly.
The implication with that example was it's some editor thing for use during the dev process separate from the actual product, so it doesn't matter if it's disposable and unmaintainable as long as it does the thing you needed it for. If the tool becomes an integral part of your workflow later on you stop and do it properly the second time around.
It’s not a misrepresentation, they’re saying the time it would take to write the code has been reduced to two minutes, not the reviewing and everything else (which still takes just as long)
Reviewing the code you didn't write takes much longer than the one you did.
Reviewing code another person wrote also takes longer than code I wrote. Hell reviewing code I wrote six months ago might as well be someone else’s code.

My job right now depending on the week is to either lead large projects dealing with code I don’t write or smaller “full stack” POCs- design, cloud infrastructure (IAC), database, backend code and ETL jobs and rarely front end code. Even before LLMs if I had to look at a project I did it took me time to ramp up.

> Reviewing code another person wrote also takes longer than code I wrote.

Yes, and water is wet, but that's not exactly relevant. If you have an LLM generate slop at you that you have to review and adjust, you need to compare the time this whole process took you rather than just the "generating slop" step to the time needed to write the code by yourself.

It may still save you time, but it won't be anywhere close to 2 minutes anymore for anything but the most trivial stuff.

I have been developing a long time - 10 years as a hobbyist and 30 years professionally. For green field work especially since all of the code I write these days are around the AWS SDKs/CDKs, I find the code is just as structured as what I would write.

The only refactoring I ended up doing on my current project is extracting functions from a script and creating a library that I reused across other functionality.

Even then I just pasted the library into a new ChatGPT session and told it the requirements of my next piece of functionality and told it to use the library.

I don’t trust an LLM to write more than 200 lines of code at the time. But I hardly ever write more than 200-300 lines at a time.

I can tell you that my latest project has around 1000 lines of Node CDK code between multiple apps (https://aws.amazon.com/cdk/) and around 1000 lines of Python code and I didn’t write a single line of any of it by hand and from reviewing it, it didn’t make any choices that I wouldn’t make and I found some of the techniques it used for the CDK code were things I wouldn’t have thought about.

The SQL code it generated for one of the functions was based on my just giving it the inbound JSON and the create table statements and it didn’t idiomatic MySQL, with parameters (ie no sql injection risk) and no unsafe code.

This was a 3 week project that I would have at least needed one if not two junior/mid level devs to do without Gen AI. Since I also had to be in customer meetings, write documentation and help sells on another project coming up.

Scaffolding is another area where LLM's work great.

I want to create a new project using framework XYZ. I already know how to do it, but I don't remember how to set up it since I only do that once, or I don't know how to set up a class that inherits from the framework because I usually just copy the other from another class in the same project. I can simply tell the bot to write the starting code and take it from there.

The sad thing is for a LOT of use cases an LLM is completely unnecessary. Like why do I even need an LLM for something like this? Why can't I just download a database of code examples, plug it into a search engine that apppears in the sidebar, and then just type "new project XYZ" or "new class XYZ.foo" to find the necessary snippet? A lot of NPM frameworks have a set up script to get you started with a new project, but after that you are practically forced to use Google.

It's crazy that a problem that could be solved so easily with a local file search has been ignored for so long and the only solution has been something impossibly inefficient for the problem it's supposed to solve.

As long as the LLM is up to date, or you really know the framework / tech well or you will be in a fair amount of pain with little understanding of how to reconcile what it’s got wrong.
Still fleshing out this idea, but it feels recently like LLMs are helping me "throw the first one away". Get the initial starting momentum on something with the LLM, continue to iterate until it mostly works, and then go in and ruthlessly strip out the weird LLMisms. Especially for tedium work where nothing is being designed, it's just transformations or boilerplate.
It's like artificial intelligence isn't intelligent at all but rather semi-useful for tedious, repetitive and non creative tasks. Who would have thought.
What's interesting is that I wouldn't really call any of the things you list software development. With the exception of the "testing starting point", they're mostly about translating from one programming language/API to another. Useful for sure, but not quite "the future of programming". Also, they all sound like the kind of thing that most "basic" models would do well, which means that the "thinking" models are a waste of money.

Finally, the productivity boost is significant from the perspective of the programmer, but I don't know how big it is from the perspective of the employer. Does this significantly shorten time-to-market?

personal favorite of mine - I want to switch data api but I dont have time to port 2 different services so here's their documentation. BAM. Done.
YES, this is the way to make AI tools pretty much a strictly positive productivity tool on large codebases.
> work you literally would have never bothered to do before done in 2 minutes.

That has been a nice outcome I didn't expect. Some of the trivial "nice to haves" I can get done by Claude, stuff I just don't have time for.

To your other points I agree as well, I think what's important isn't so much stuffing the context with data, but providing the context with the key insights to the problem you have in your head.

In your first example, perhaps the insight is knowing that you have a special snowflake data structure that needs to be explained first. Or another example is properly explaining historical context of a complex bug. Just saying "hey here's all the code please fix this error" yields less good results, if the problem space is particularly complex.