Hacker News new | ask | show | jobs
by ben_w 370 days ago
Why does:

> 25% of developers estimate that 1 in 5 AI-generated suggestions contain factual errors or misleading code.

Seem incompatible with "often full of noise", to you?

I can't speak for factual errors, but I'd say less than 20% of the code ChatGPT* gives me contains clear errors — more like 10%. Perhaps that just means I can't spot all the subtle bugs.

But even in the best case, there's a lot of "noise" in the answers they give me: Excess comments that don't add anything, a whole class file when I wanted just a function, that kind of thing.

* Other LLMs are different, and I've had one (I think it was Phi-2) start bad then switch both task *and language* mid-way through.

2 comments

I'd say for me, it depends on the task and the language. I find asking ChatGPT to generate some code that I copy and paste lines up with your experience. Same with using an agent in a new project. I find the error rate much higher though once I start asking it to write code using specific libraries. Or when using an agent in an established code base. It's also terrible with DSLs that probably don't have as much training data. Trying to get it to do anything with Azures KQL is borderline pointless.
Because it is much higher than 25%
Not my experience at all. 25% sounds really high. I can't even remember the last time it gave me an error that wasn't reasonable (e.g. based on incomplete information) and was just pure noise.
fwiw, i dont mean to suggest AI is pure noise or even that AI isnt worth using. the report just doesnt reconcile with my experiences at all.

my experiences range from helping design penn's new AI degree programs, hearing from friends at algorithmic hedge funds, hearing from friends at startups, and my own development.

Im curious what types of tasks you're using it for?