Hacker News new | ask | show | jobs
by builtfordevs 116 days ago
This is a really interesting approach to automation! The idea of treating it like an "async teammate" rather than a copilot is a clever mental model.

For trust, I'd want to see metrics on how often it gets the implementation "right enough" on first try vs. needs significant rework. The confidence threshold tuning sounds crucial - too conservative and it barely helps, too aggressive and you spend more time fixing than coding from scratch.

Have you tested it on tickets with ambiguous requirements? That seems like where it would struggle most, but also where the confidence evaluation becomes really important.