Hacker News new | ask | show | jobs
by bluefirebrand 382 days ago
> It’s mostly right enough.

Honestly this is why your experience is different: your expectations are different (and likely lower). I never find they are "mostly right enough", I find they are "mostly wrong in ways that range from subtle mistakes to extremely incorrect". The more subtly they are wrong, the worse I rate their output actually, because that is what costs me more time when I try to use them

I want tools that save me time. When I use LLMs I have to carefully write the prompts, read and understand, evaluate, and iterate on the output to get "close enough" then fix it up to be actually correct.

By the time I've done all of that, I probably could have just written it from scratch.

The fact is that typing speed has basically never been the bottleneck for developer productivity, and LLMs basically don't offer much except "generate the lines of code more quickly" imo

3 comments

It's also what you're writing. The GP's commenter's bio shows they're a product lead, not a full-time software developer. To make some broad assumptions about what kind of code they're talking about: using an LLM for "write me a Python script that queries the Jira API for all tickets closed in the past week" is a much different task from "change the code in our 15 year old in-house accounting software to handle these tariffs", both in terms of the code that gets written as well as the consequences of the LLM getting it wrong.

To be clear this isn't a knock on anyone's work, but it does seem to be a source of why "pro-LLM" and "anti-LLM" groups tend to talk past each other.

Sure, but in both cases you are running a real risk of producing incorrect data

If you're a product lead and you ask an LLM to produce a script that gets that output, you still should verify the output is correct

Otherwise you run a real risk of seeming like an idiot later when you give a report on "tickets closed in the past week" and your data is completely wrong. "Why hasn't John closed any tickets this week? Is he slacking off?"... "What he closed more tickets than anyone..." And then it turns out that the unreliable LLM script excluded him for whatever reason

Of course I understand that people are not going to actually be this careful, because more and more people are trusting LLM output without verifying it. Because it's "right enough" that we are becoming complacent

You're absolutely right. You need to verify the script works, and you need to be able to read the code to see what it's actually doing and if it passes the smell test (as a sibling commenter said, the same way you would for a code snippet off StackOverflow). But ultimately for these bits which are largely rote "take data from API, transform into data format X" tasks, LLMs do a great job getting at least 95% of the way there, in my experience. In a lot of ways they're the perfect job for LLMs: most of the work is just typing (as in, pressing buttons on a keyboard) and passing the right arguments to an API, so why not outsource that to an LLM and verify the output?

The challenge comes when dealing with larger systems. Like an LLM might suggest Library A for accomplishing a task, but if your codebase already has Library B for that already, or maybe Library A but a version from 2020 with a different API, you need to make judgment calls about the right approach to take, and the LLM can't help you there. Same with code style, architecture, how future-proof-but-possibly-YAGNI you want your design to be, etc.

I don't think "vibe coding" or making large changes across big code bases really works (or will ever really work), but I do think LLMs are useful for isolated tasks and it's a mistake to totally dismiss them.

> so why not outsource that to an LLM and verify the output?

I mean sure, why not. My argument isn't that it doesn't work, it's that it doesn't really save time

If you try to have it do big changes you will be swamped reviewing those changes for correctness for a long time while you build a mental model of the work

If you have it do small changes, the actual performance improvement is marginal at best, because small changes already don't take much time or effort to create

I really think that LLM-coding has largely just shifted "time spent typing" to "time spent reviewing"

Yes, past a certain size reviewing is faster than typing. But LLMs are not producing terribly good output for large amounts of code still

I disagree that it doesn't save time for some classes of problems.

As a concrete recent example, I had to write a Python script which checked for any postgres tables where the primary key was of type 'INT' and print out the max value of the ID for each table. I know broadly how to do this, but I'd have to double check which information_schema table to use, the right names of the columns to use, etc. Plus a refresher on direct use of psycopg2 and the cursor API. Plus the typing itself. I just put that query into an LLM and it gave me exactly what I needed, took about 30-60 seconds total. Between the research and typing that's easily 10 minutes saved, maybe closer to 20 really.

And I mean, no, this example isn't worth the $10 trillion or whatever the economy thinks AI is worth, but given that it exists, I'm happy to take advantage of it.

I don't see a lot of value in "saving 10-20 minutes here and there" tbh

Especially since I'm not ever likely to see any benefit from my employer for that extra productivity

> you still should verify the output is correct

And that's a problem with the workflow, not a problem with the LLM.

It's no different than verifying the information from your Google search or the Stack Overflow answer you found works. But for some reason there are people that have higher expectations of LLM output.

People aren't trying to produce entire codebases in 10 minutes using Stack Overflow, or giving it free reign to refactor the entire codebase
Having poked at a few database queries with subtle errors that compounded with a flawed understanding resulting in wildly incorrect conclusions, [a realistic expansion of] “write me a Python script that queries the Jira API for all tickets closed in the past week” is exactly the place where I expect those fuckups to come from.
They save me a tremendous amount of time, you just need to be smart about what you try to get them to do. _Busy work_ is what you want to focus on, not anything that takes a ton of domain knowledge and intelligence.

Just as an example from today, i had a huge pile of yaml documents that needed to have some transformations done to them -- they were pretty simple and obvious, but I just went into cursor, give it a before and after and a few notes, and it wrote a python script in less than 10 seconds that converted everything exactly the way I needed. Did it save me a day of work? Probably not, but probably an hour or so of looking up python docs and iterating until i worked out all the syntax errors myself? An hour here and an hour there adds up to a _lot_ of saved time.

I spent more time just writing this comment then I did asking cursor to write and run that script for me.

Other things I had an LLM do for me just _today_ is fix a github action that was failing, and knock out a developer readme for a helm chart documenting what all the values do -- that's one of the kinds of things where it gets a lot of stuff wrong, but typing speed _is_ the bottleneck. It took me a minute or so to fix the stuff it misunderstood, but the formatting and the bulk of it was fine.

Isn't the article saying it's mainly useful for SW?

I'm an electrical engineer and the only cases LLMs useful were developing phyton scripts or translating a text into a foreign language that I'm fluently speaking.

They are absolutely garbage for anything electrical engineering related, even coding RTL.

> _Busy work_ is what you want to focus on, not anything that takes a ton of domain knowledge and intelligence

Eh..

Maybe that's more of a sign that we shouldn't be doing busywork in the first place

You are in a magical place if you never have to do busy work.
This. I use LLMs for some tasks, but for more complex issues, I do it myself. I tried to use it for a project by defining each task as clearly as possible, and I spent weeks trying to come up with something useful. Mind you, I achieved 80% of what I wanted after iterating and "telling" the chat that their answers were wrong, and going over the code to double-check if everything was okay. Now I use it for specific, simple tasks if these are work-related, and then use it for random kinds of stuff that I can verify by going to the actual source.
> Mind you, I achieved 80% of what I wanted after iterating and "telling" the chat that their answers were wrong, and going over the code to double-check if everything was okay

I very often read things like this, and I'm surprised how often the person estimates "around 80%" of the work was good. It feels so perfectly tailored to the Pareto Principal

The LLM does the easy 80% (which we usually say takes 20% of the time anyways). Then the human has to go do the harder remaining 20%, only with a much smaller mental model of how the original 80% is fitting together