Just create a skill for it -> I call mine `babysit`. It spins up a subagent that polls it every x minutes and auto-fixes it until it's green. I already continue with the next task while it does that in the background
About 30% of our AI PR review checks are flawed at a fundamental level, based on poor comprehension of the whole. If I told the AI "fix everything and keep going until it's green" I would be terrified of the result.
I disbelieve this works in anything other than a toy codebase (or an incredibly fine-grained microservice).
The 70% is amazing! But a 30% failure rate requires intense supervision.
It is possible. I tell to use cli app, and for the agent to ad timer and check the status once in a while. Especially if there is something with a long wait. Also if it can run some validators/ same tools locally, would be much faster.
Might tend to deviate and waste time, needs guiding once in a while, and to check what is it spewing out, point it in the correct direction.