Hacker News new | ask | show | jobs
by yumraj 628 days ago
I’ve been thinking about this, since LLMs helped me get something done quickly in languages/frameworks that I had no prior experience in.

But I realized a few things, that while they are phenomenally great when starting new projects and small code bases:

1) one needs to know programming/soft engineering in order to use these well. Else, blind copying will hurt and you won’t know what’s happening when code doesn’t work

2) production code is a whole different problem that one will need to solve. Copy pasters will not know what they don’t know and need to know in order to have production quality code

3) Maintenance of code, adding features, etc is going to become n-times harder the more the code is LLM generated. Even large context windows will start failing, and hell hallucinations may screw up without one even realizing

4) debugging and bug fixing, related to maintenance above, is going to get harder.

These problems may get solved, but till then:

1) we’ll start seeing a lot more shitty code

2) the gap between great engineers and everyone else will become wider

6 comments

In my opinion this is unlikely to be a real problem. In one breath people are saying all they're giving you is stack overflow boilerplate and then in the same breath stating it is going to provide some unseen entropic answer.

The truth of the matter, yes, organisations are likely to see less uniformity in their codebase but issues are likely to be more isolated/less systemic. More code will also be pushed faster. Okay so yes, there is some additional complexity.

However, as they say, if you can't beat 'em, join 'em. The easiest way to stay on top of this will be to use LLM's to review your existing codebase for inconsistencies, provide overviews and commentary over how it all works, basically simplifying and speeding up working with this additional complexity. The code itself is being abstracted away.

Related discussion we were having now on Mastodon: https://floss.social/@janriemer/113260186319661283
I hadn’t even gone that far in my note above, but that is exactly correct.

We’ll have a resurgence of “edge-cases” and all kinds of security issues.

LLMs are a phenomenal Stackoverflow replacement and better at creating larger samples than just a small snippet. But, at least at the moment, that’s it.

100% on the SO replacement, which is a shame, as I loved and benefited deeply from SO over the years.

I wonder about the proliferation of edge cases. Probably true, but an optimistic outlook, and at least in my own work, LLM’s deliver a failing test faster given new information, and the edge gets covered faster.

Perhaps.

I was referring to the above Mastodon thread, which if I understood correctly (I just scanned, didn't get too deep), was referring to ASCII vs Unicode in generated Rust code. And, I was reminded of issues we've come across over the years regarding assumptions around names, addresses, date/time handling and so on to name a few.

So, my worry is generated code will take the easy way out, create something that will be used, the LLM-user will not even realize the issue since they'll lack deeper experience ... and much later, in production, users will run into the "edge-case" bugs later on. It's a hypothesis at this point, nothing more..

> 1) we’ll start seeing a lot more shitty code

This one feels like it could be true but only in terms of number of lines of code because more code will be written since ai makes generation faster.

But so much of the code I’ve seen is shitty, I can’t believe we can get materially worse in terms of percentage. Especially because LLM stuff is often well commented to what it’s attempting to do.

A big part of the solution to this will be more, more focused, and more efficient QA.

Test-driven development can inherently be cycled until correct (that's basically equivalent to what a Generative Adversarial Network does under the hood anyhow).

I heard a lot of tech shops gutted their QA departments. I view that as a major error on their parts, if QA folks are current modern tooling (not only GenAI) and not trying to do everything manually.

Many years ago I was at a very large software company, that everyone has heard of.

Blackbox QA was entirely gutted, only some whitebox QA. Their titles were changed to software engineer from QA engineer. Dev were supposed to do TDD and that’s it, and there’s a fundamental issue there which looks like people don’t even realize.

Anyway, we digress.

And maintenance, with adding features to legacy code and debugging, is much more common (and important) than getting small green&field projects up and running.
Exactly my point.
> Even large context windows will start failing

What do you mean by that?

If you have a large code base, a software engineer has to look at many files, and step through a big stack to figure out the bugs. Forget about concurrency and multi-threaded scenarios.

I’m assuming that an LLM will have to ingest that entire code base as part of the prompt to find the problem, refactor the code, add features that span edits across numerous files.

So, at some point, even the largest context window won’t be sufficient. So what do you do?

Perhaps a RAG of the entire codebase, I don’t know. Smarter people will have to figure it out.

Hi. I built an AI coding tool that bypass this problem by allowing human to select the relevant code context instead of passing the entire codebase or using RAG, which has problems with precision and recall.

It takes a bit of effort but the result is much better.

You can check it out: https://prompt.16x.engineer/

That’s an interesting approach and I can see it working, at least working better than the default. It’ll at least work as a stop gap, till there’s a better option. Good luck!!

Also, some unsolicited advice if I may, your pricing is a little wonky, you may want to rethink that. I hate subscription pricing where it doesn’t make sense, but in this case subscription is a better option. Also, your team pricing should be per seat, perhaps with some tiers.

I like to know who is behind a tool, in your case that info requires several clicks. Also, use a more pleasant palette.

Edit: you should do a ShowHN..

Thanks for the suggestions.

The app is currently on lifetime license without the need of an account, so it is quite tricky to implement subscription without overhauling the entire app and workflow. I am going to raise price to match the expected LTV of a user multiplied by a discount factor.

For the landing page suggestions, I will see what I can do to put more personal touch, for now it is meant to look authoratative and professional.

I listened to the Cursor team on Lex Fridman yesterday. The biggest thing I took away is they have some wild ideas with having agents running in the background that are following what you are creating, trying to find bugs.

I understand what you mean but surely those guys were thinking about this awhile ago now. That part seems obvious listening to them. They are thinking way beyond that.

Some LLMs already have a context window of 1M tokens, which I believe is already more than any human dev, but yes, I agree that it's not enough to look at it statically. Rather a multistep approach utilizing RAG and/or working directly with the language server would be the way to go. This recent post from Aider about using o1 as an architect seems to me like a good move in this direction - https://aider.chat/2024/09/26/architect.html