|
|
|
|
|
by llbbdd
54 days ago
|
|
Are you using any tools specifically for controlling this behavior that you can recommend? I want to tear my hair out every time Claude cleanly 1-shots weeks of work to 99% accuracy, one or a couple of tests fail, and it calmly resolves it with a declaration that it was a "pre-existing failure" or "flaky". It can usually resolve it if I then explicitly tell it to stash the changes and compare against the test results from the prior state, but it happens constantly. |
|