Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to:
- add tests
- break files up into smaller parts
- test the smaller parts
- then actually improve behavior
(Which is no different than what you would do as a human)
One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.
While I could generally agree, in this specific instance if the AI were “thinking” correctly it should have found the mistake. I admit it was a difficult problem though (solving it required creativity).
To be more precise, the prompt actually pointed to where there could be issues, and the issue, which was exactly of the kind that was pointed at, was not found.
It's not worth bothering with unless the task is very difficult, long-context, long-running, or all of the above. But, when it's worth using, it genuinely increases success rates and appears to amplify model intelligence.
Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)
Asking LLMs to do things in different ways does sometimes get them to answer correctly when they didn't with a previous prompt that is effectively equivalent but people really go nuts anthropomorphizing this behavior.
ChatGPT has no empathy for you keeping your job, you just lucked into a more helpful predictive text chain based on some combination of the input and the random temperature.
Asking it to just 'try again, dummy' could have worked equally well (or not, its all just probabilities after all).
I did too, but then added something very similar to a prompt ("must be accurate") for an ai-backed feature out of frustration, and sure enough it fixed the issue. Lord have mercy
(Which is no different than what you would do as a human)