Hacker News new | ask | show | jobs
by jmsdnns 370 days ago
> 25% of developers 1 in 5 AI-generated suggestions estimate that contain factual errors or misleading code.

I cannot believe what's said in the report because it doesnt even reflect what my pro-AI coding friends say is true. Every dev I know says AI generated suggestions are often full of noise, even the pro-AI folks.

2 comments

I think this really highlights the difference between "pro ai" and "anti ai" people

"It's full of noise but I'm confident I can cut through it to get to the good stuff" - Pro AI

"It's full of noise and it takes more effort to cut through than it would take to just build it myself" - Anti AI

I'm pretty Anti myself. I think "I can cut through the noise" is pretty misplaced overconfidence for a lot of devs

I don't think I would place myself on either sides, I guess I'm in the "AI is OK at some stuff" camp.

But if you're getting a lot of noise, I'd immediately try to adjust my system/user prompt to never get that noise in the first place. I'm currently using a variation of https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... which is basically my personal coding guidelines but "codified" as simple rules for LLMs to understand.

For anything besides the dumb models, I get code that more or less looks exactly like how I would have written it myself. When I find I get code back that I'm not happy with, I adjust the system/user prompt further so this time and the next it returns code like how I would have done it.

I feel I should clarify

When it comes to judging the quality of AI output, I do agree with "AI is ok at some stuff"

When I say I tend to fall on the Anti AI side, I am saying "But I still don't think it's worth using much"

I don't really want to lean on tools that are just ok at some stuff.

I basically only use AI for tasks that I could also do myself, because those are the tasks where I can find and fix bugs. When used like this, AI can save a lot of typing.

So I guess that puts me into "pro AI" camp, but it's not like we actually disagree.

That's fair

I don't really find that typing is my bottleneck mostly. AI saving me time spent typing code also just costs me time spent prompting and re-prompting the AI so... Kinda a wash mostly?

Why does:

> 25% of developers estimate that 1 in 5 AI-generated suggestions contain factual errors or misleading code.

Seem incompatible with "often full of noise", to you?

I can't speak for factual errors, but I'd say less than 20% of the code ChatGPT* gives me contains clear errors — more like 10%. Perhaps that just means I can't spot all the subtle bugs.

But even in the best case, there's a lot of "noise" in the answers they give me: Excess comments that don't add anything, a whole class file when I wanted just a function, that kind of thing.

* Other LLMs are different, and I've had one (I think it was Phi-2) start bad then switch both task *and language* mid-way through.

I'd say for me, it depends on the task and the language. I find asking ChatGPT to generate some code that I copy and paste lines up with your experience. Same with using an agent in a new project. I find the error rate much higher though once I start asking it to write code using specific libraries. Or when using an agent in an established code base. It's also terrible with DSLs that probably don't have as much training data. Trying to get it to do anything with Azures KQL is borderline pointless.
Because it is much higher than 25%
Not my experience at all. 25% sounds really high. I can't even remember the last time it gave me an error that wasn't reasonable (e.g. based on incomplete information) and was just pure noise.
fwiw, i dont mean to suggest AI is pure noise or even that AI isnt worth using. the report just doesnt reconcile with my experiences at all.

my experiences range from helping design penn's new AI degree programs, hearing from friends at algorithmic hedge funds, hearing from friends at startups, and my own development.

Im curious what types of tasks you're using it for?