Hacker News new | ask | show | jobs
by wyldfire 149 days ago
I found that when I used ChatGPT's web chat, I frequently went back and forth, shuttling its output to an editor or IDE and then coming back to ChatGPT with "oh, now it fails like this: ...". It made me feel like I was the automaton now.

Claude Code was transformative and it made me realize that something very incredibly significant had occurred. Letting the LLM "drive" like this was inevitable. Now I see just exactly how this will transform our industry. I'm a little scared about how it will end for me/us, but excited for now.

2 comments

I've found that "back and forth" is part of the learning aspect LLM's provide. Even when the LLM provides code snippets, I type them out myself, which I've found forces me to think and understand them - often finding flaws and poor assumptions along the way.

Letting an LLM drive in an agentic flow removes you from the equation. Maybe that's what some want - but I've personally found I end up with something that doesn't feel like I wrote it.

Now... get off my lawn!

It's correct, you didn't write it. Do you also avoid using frameworks and libraries for desire of feeling like you wrote the program you produced? You must have another reason to not want to use this code.
When you use frameworks or libraries, you are trusting (hoping) the author(s) spent the time to get it right. At a minimum, the framework/library is documented in literal documentation and/or code that's static and can be read and understood by you. Ask an LLM to do a task 3 times, you'll get 3 different outputs - they're non-deterministic.

I catch a lot of nonsensical and inefficient code when I have that "back and forth" described above - particularly when it comes to architectural decisions. An agent producing hundreds or thousands of lines of code, and making architectural decisions all in one-go will mean catching those problems will be vastly more challenging or impossible.

I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code. It's just a mental/brain drain. I've wasted so many hours wondering if I'm just dumb and missing something or not-understanding some code - only to later realize the LLM was on the fritz. Having little or no previous context to understand the code creates a "standing at the foot of Mt. Everest" feeling constantly, over and over.

>I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code.

absolute opposite here.

LLMs , for better or worse, generally stick to paradigms if they have the codebase in front of them to read.

This is rarely the case when dealing with an amateur's code.

Amateurs write functional-ish code. TDD-ish tests. If the language they're using supports it types will be spotty or inconsistent. Variable naming schemes will change with the current trend when the author wrote that snippet ; and whatever format they want to use that day will use randomized vocabulary with lots of non-speak like 'value', or 'entry' in ambiguous roles.

LLMs write gibberish all day, BUT will generally abide by style documents fairly well. Humams... don't.

These things evolve as the codebase matures, obviously, but that's because it was polished into something good. LLMs can't reason well and their logic sometimes sucks, but if the AGENTS.md says that all variables shall be cat breeds -- damnit that's what it'll do (to a fault).

but my point : real logic and reasoning problems become easier to spot when you're not correcting stupid things all day. it's essentially always about knowing how to use the model and whatever platform it's jumping from. Don't give it the keys to create the logical foundation of the code, use it to polish brass.

garbage in -> garbage out ain't going anywhere.

False equivalency. The maintenance and expertise required to run the codebase you’ve generated still falls flatly on you. When you use a library or a framework it normally is domain experts that do that stuff.
I’m so glad we’ve got domain experts to write those tricky things like left-pad for us.

On a more serious note, I do think that the maintenance aspect is a differentiator, and that if it’s something that you end up committing to your codebase then ownership and accountability falls to you. Externally sourced libraries and frameworks ultimately have different owners.

I'm reminded of the recent "vibe coded" OCaml fiasco[1].

In particular, the PR author's response to this question:

> Here's my question: why did the files that you submitted name Mark Shinwell as the author?

> > Beats me. AI decided to do so and I didn't question it.

The same author submitted a similar PR to Julia as well. Both were closed in-part due to the significant maintenance burden these entirely LLM-written PR's would create.

> This humongous amount of code is hard to review, and very lightly tested. (You are only testing that basic functionality works.) Inevitably the code will be full of problems, and we (the maintainers of the compiler) will have to pay the cost of fixing them. But maintaining large pieces of plausible-in-general-but-weird-in-the-details code is a large burden.

Setting aside the significant volume of code being committed at once (13K+ lines in the OCaml example), the maintainers would have to review code even the PR author didn't review - and would likely fall into the same trap many of us have found ourselves in while reviewing LLM-generated code... "Am I an idiot or is this code broken? I must be missing something obvious..." (followed by wasted time and effort).

The PR author even admitted they know little about compilers - making them unqualified to review the LLM-generated code.

[1] https://github.com/ocaml/ocaml/pull/14369

This exchange is so funny. That much time around LLMs and no human feedback really seems to have broken that guy's brain.
Yeah but honestly VSCode with Github copilot plugin works in a similar way. Might not be as good but it works