I think the problem should be defined as "why does it not loop back the errors from the first attempt so it can fix it on the second attempt" rather than why it fails to produce a fully correct implementation on the first pass.
you’ve got to give it a way (eg rendering with playwright and friends) and tell it to use that way to verify correctness. it’s not going to create the guard rail for you but if you provide it with one the output is much better.