Hacker News new | ask | show | jobs
by doyougnu 193 days ago
I've interfaced with some AI generated code and after several examples of finding subtle and yet very wrong bugs I now find that I digest code that I suspect coming from AI (or an AI loving coworker) with much much more scrutiny than I used to. I've frankly lost trust in any kind of care for quality or due diligence from some coworkers.

I see how the use of AI is useful, but I feel that the practitioners of AI-as-coding-agent are running away from the real work. How can you tell me about the system you say you have created if you don't have the patience to make it or think about it deeply in the first place?

2 comments

Your coworkers were probably writing subtle bugs before AI too.
Would you rather consume a bowl of soup with a fly in it, or a 50 gallon drum with 1,000 flies in it? In which scenario are you more likely to fish out all the flies before you eat one?
Easier to skim 1000 flies from a single drum than 100 flies from 100 bowls of soup.
Alas, the flies are not floating on the surface. They are deeply mixed in, almost as if the machine that generated the soup wanted desperately to appear to be doing an excellent job making fly-free soup.
… while not having a real distinction between flies and non-fly ingredients.
No, I think it would be far easier to pick 100 flies each from a single bowl of soup than to pick all 1000 flies out of a 50 gallon drum.

You don’t get to fix bugs in code by simply pouring it through a filter.

I think the dynamic is different - before, they were writing and testing the functions and features as they went. Now, (some of) my coworkers just push a PR for the first or second thing copilot suggested. They generate code, test it once, it works that time, and then they ship it. So when I am looking through the PR it's effectively the _first_ time a human has actually looked over the suggested code.

Anecdote: In the 2 months after my org pushed copilot down to everyone the number of warnings in the codebase of our main project went from 2 to 65. I eventually cleaned those up and created a github action that rejects any PR if it emits new warnings, but it created a lot of pushback initially.

Then when you've taken an hour to be the first person to understand how their code works from top to bottom and point out obvious bugs, problems and design improvements (no, I don't think this component needs 8 useEffects added to it which deal exclusively with global state that's only relevant 2 layers down, which are effectively treating React components like an event handling system for data - don't believe people who tell you LLMs are good at React, if you see a useEffect with an obvious LLM comment above it, it's likely to be buggy or unnecessary), your questions about it are answered with an immediate flurry of commits and it's back to square one.

Who are we speeding up, exactly?

Yep, and if you're lucky they actually paste your comments back into the LLM. A lot of times it seems like they just prompted for some generic changes, and the next revision has tons of changes from the first draft. Your job basically becomes playing reviewer to someone else's interactions with an LLM.

It's about as productive as people who reply to questions with "ChatGPT says <...>" except they're getting paid to do it.

I wonder if there’s a way to measure the cost of such code and associate it with the individuals incurring it. Unless this shows on reports, managers will continue believing LLMs are magic time saving machines writing perfect code.