Validating programs against a formal spec is very, very hard for foundational computational complexity reasons. There's a reason why the largest programs whose code was fully verified against a formal spec, and at an enormous cost, were ~10KLOC. If you want to do it using proofs, then lines of proof outnumber lines of code 10-1000 to 1, and the work is far harder than for proofs in mathematics (that are typically much shorter). There are less absolute ways of checking spec conformance at some useful level of confidence, and they can be worthwhile, but they require expertise and care (I'm very much in favour of using them, but the thought that AI can "just" prove conformance to a formal spec ignores the computational complexity results in that field).
For most cases we don't need nearly that comprehensive verification. This is expecting more off AI written code than we ever bother to subject most human written code to. There's a vast chasm there we only need to even slightly start to bridge to get to far higher confidence levels than the typical human dev team achieves.
> For most cases we don't need nearly that comprehensive verification. This is expecting more off AI written code than we ever bother to subject most human written code to.
True.
> There's a vast chasm there we only need to even slightly start to bridge to get to far higher confidence levels than the typical human dev team achieves.
The word "slightly" is doing a lot of work here to the point of making it impossible to estimate. For example, the complexity classes P and NP are only slightly apart, and yet that's where a very practical barrier between feasibility and infeasibility lies. I don't doubt that one day AI may be able to write programs as well as humans, although nobody can estimate how soon that day will come, but nobody knows how wide the gap between that and "far higher confidence" is. Maybe there are fundamental computational complexity barriers in that gap that no amount of intelligence can cross, and maybe there aren't. Nobody knows yet.
What we do know is that anything humans do is possible - after all, we're doing it - and many things we need and humans can't do (including predicting nonlinear systems like the behavious of economy) no machine can do drastically better because of complexity limitations.