| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aeldidi 126 days ago

There's an odd trend with these sorts of posts where the author claims to have had some transformative change in their workflow brought upon by LLM coding tools, but also seemingly has nothing to show for it. To me, using the most recent ChatGPT Codex (5.3 on "Extra High" reasoning), it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right. This can still be useful, but is a far cry from what seems to be the online discourse right now.

As a real world example, I was told to evaluate Claude Code and ChatGPT codex at my current job since my boss had heard about them and wanted to know what it would mean for our operations. Our main environment is a C# and Typescript monorepo with 2 products being developed, and even with a pretty extensive test suite and a nearly 100 line "AGENTS.md" file, all models I tried basically fail or try to shortcut nearly every task I give it, even when using "plan mode" to give it time to come up with a plan before starting. To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.

It almost feels like this is some "open secret" which we're all pretending isn't the case too, since if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed. I don't mean to sound dismissive, but I really do feel like I'm going crazy here.

43 comments

RealityVoid 126 days ago

You're not going crazy. That is what I see as well. But, I do think there is value in:

- driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart

- doing things you normally don't know. I learned a lot of command like tools and trucks by seeing what Claude does. Doing short scripts for stuff is super useful. Of course, the catch here is if you don't know stuff you can't drive it very well. So you need to use the things in isolation.

- exploring alternative solutions. Stuff that by definition you don't know. Of course, some will not work, but it widens your horizon

- exploring unfamiliar codebases. It can ingest huge amounts of data so exploration will be faster. (But less comprehensive than if you do it yourself fully)

- maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)