I don't really use anthropic models. But when I tried it with others they can write tests but they never confirm that they fail before they proceed to make implementation that causes them to pass. Maybe I didn't prompt it forcibly enough.
I haven’t tried this (yet), but I’ve heard of people disabling write access to test code while the agent is writing implementation and vice versa. I imagine “disabling” could be done via prompting, or just a quick one liner like: chmod -r 0644 ./tests
no other engineering profession would accept the standards(or rather their lack of) on which software engineering is running.