| HN Mirror

On the contrary, it’s worse than useless. If it could fix 12% of bugs (it can’t — it only fixes 12% of their benchmark suite), you’d still have to figure out which 12% of the responses it gave were good. So, 88% of the time you’d have wasted time confirming a “fix” that doesn’t work. But it’s worse than that. Because even on the fixes it got right, you’d still have to fully vet it, because this tool doesn’t know when it can’t solve something, or ask for clarification. It just gives a wrong answer.

So you didn’t save 12% of your effort, you wasted probably more than double your effort checking the work of a tool that is wrong eight out of nine times.