My experience has usually been that it will apologize and then produce another snippet with a different flaw. When that is pointed out, it will usually go back to the original snippet with the original flaw. It sometimes also insists that it has “run the test suite and they all pass”, just like a human programmer that is trying to fake it until they make it I guess?
Hooking it up to automatically run the code in question and examine the output is a trivial undertaking - many folks have already done this.