Absolutely no one is arguing that you shouldn’t have a combination of manual and automated tests around either AI or human generated code or that you shouldn’t have a thoughtful design
In a non-trivial app you can't test your way through all of the e2e workflows and thoughtful design isn't what I'm talking about.
How many bugs have you seen that passed your automated and manual testing? Probably 99.9% of them.
Now imagine that you take those same test suites and you unleash an agent on the code that has far worse reasoning capabilities than a human and you tell them they can change anything in the code as long as the tests pass.
If you have an employee who codes 2x faster than everyone else but produces 10x the bugs, would your suggestion to be to let him rip and stop reviewing his code output?
> I never suggested letting agents code for a day on end. I use AI to code well defined tasks and treat it like a mid level ticket taker
It doesn’t matter how long you’re letting it run. If you aren’t reviewing the output, you have no way of knowing when it changes untested behavior.
I regularly find Claude doing insane things that I never would have thought to test against, that would have made it into prod if I hadn’t renewed the code.
> It doesn’t matter how long you’re letting it run. If you aren’t reviewing the output, you have no way of knowing when it changes untested behavior.
You’re focused on the output , I’m focused on the behavior. Thats the difference. Just like when I delegate a task to either another developer or another company like the random Salesforce integration or even a third party API I need to integrate with.
Unfortunately you are not equipped to observe and test all or even most of the behavior of a non-trivial system.
And if you attempt to treat every module in your system like it’s untrusted 3rd party code you’ll run into severe complexity and size limits. No one codes large systems like that because it’s not possible. There are always escape hatches and entanglements.
How many bugs have you seen that passed your automated and manual testing? Probably 99.9% of them.
Now imagine that you take those same test suites and you unleash an agent on the code that has far worse reasoning capabilities than a human and you tell them they can change anything in the code as long as the tests pass.