Tests only work for a limited set of programming verification. In many cases you don’t actually know what the output for any given input should be, so there’s no way of verifying the AI-generated code. You just kind of have to trust it. The only exception I can think of is robotics and quantitative trading. Which have already been extensively utilizing AI.
Well, if you can formalise the problem statement (this is the hard part) sufficiently well that the computer can produce a proof, you can be very sure the proof is sound.
A fundamental property of any formal proof is that it can be checked by a fairly stupid machine, automatically, because every step is a simple mechanical operation that names one of a handful of axioms and refers to a handful of earlier steps, the truth of which has already been established. So while coming up with a proof may require genius-level thinking, checking an existing fully fleshed out proof is simple -- just potentially very tedious because of the sheer number of steps.
That said, a typical human-written proof omits many steps considered "obvious" to a trained mathematician. Converting this to a formal proof involves interpreting what the original author "must have meant", which requires a lot of expertise and can go wrong -- or it may reveal that there is some inconsistency in the original claim itself.
So… more peer review backlog. That sounds fun. Oh, you want someone to review your paper, Mr phd in mathematics with 20 years of experience? Get in line behind chatGPT.
But I also definitely don't understand the problem if I can't get the computer to understand it, with tests.
In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.
With all of these questions in the air, epistemology might be making a comeback.