Hacker News new | ask | show | jobs
by kangraemin 69 days ago
The "sanitised optimism" problem is real. I've seen agents report "fixed!" when they just suppressed the error.

Role separation (builder/reviewer/tester) helps but the reviewer agent also tends to be too polite. Making the reviewer explicitly output PASS/FAIL/UNKNOWN with no room for "looks good overall" is the only thing that worked for me.