Hacker News new | ask | show | jobs
by llbbdd 54 days ago
Are you using any tools specifically for controlling this behavior that you can recommend? I want to tear my hair out every time Claude cleanly 1-shots weeks of work to 99% accuracy, one or a couple of tests fail, and it calmly resolves it with a declaration that it was a "pre-existing failure" or "flaky". It can usually resolve it if I then explicitly tell it to stash the changes and compare against the test results from the prior state, but it happens constantly.