These models do well changing brownfield applications that have tests because the constraints on a successful implementation are tight. Their solutions can be automatically augmented by research and documentation.
I don't exactly disagree with this but I have seen models simply deleting the tests, or updating the tests to pass and declaring the failures were "unrelated to my changes", so it helpfully fixed them
I’ve had to deal with this a handful of times. You just have to make it restore the test, or keep trying to pass a suite of explicit red-green method tests it wrote earlier.