On the contrary, it’s worse than useless. If it could fix 12% of bugs (it can’t — it only fixes 12% of their benchmark suite), you’d still have to figure out which 12% of the responses it gave were good. So, 88% of the time you’d have wasted time confirming a “fix” that doesn’t work. But it’s worse than that. Because even on the fixes it got right, you’d still have to fully vet it, because this tool doesn’t know when it can’t solve something, or ask for clarification. It just gives a wrong answer.
So you didn’t save 12% of your effort, you wasted probably more than double your effort checking the work of a tool that is wrong eight out of nine times.
That’s not what I said and you know it. I’m not saying LLMs are useless. I’m not even saying this tool is useless. I’m saying I’m not impressed with this tool, at least as represented in the demo.
So you didn’t save 12% of your effort, you wasted probably more than double your effort checking the work of a tool that is wrong eight out of nine times.