| I am still greatly disappointed by the insistence on end-to-end black-box models. In the end, as impressive as these results are, they are fundamentally trending in the wrong direction. All the benefits and certainty (e.g. security, correctness, and reproducibility) provided by theorem provers and model-driven systems are thrown out the window in favour of fast but potentially wrong or insecure results. The worst part of this development is the psychological aspect - humans have a tendency to rely on machine generated results and view those as superior. The disconnect between working code and correct or secure code respectively is going to widen using this approach. A glaring example is found in the blog post: the image manipulation example (7.) contains an error that the author failed to even recognise or mention. Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern. This is clearly neither a mosaic nor image manipulation. It is a very impressive tech demo, but generating actual software that can be trusted and rigorously checked against requirements will end up using a formal description (i.e. programming language, theorems, or modelling akin to UML) anyway. |
Formal methods don't eliminate wrong or insecure results. Formal methods tell you that a program matches the specification when certain conditions are true eg. no bit flips, the computer does not crash, the kernel doesn't kill your process, that allocations succeed, that your program can make progress (the kernel can decide to never schedule your program), etc. You can have bugs in writing the specification where the specification does not match your intention. Even your intention of a system may have vulnerabilities in it. If your specification can't generate code, your code might not match the specification.
Using formal methods slows you down compared to things like testing which can get us most of the way there in less time. Systems can be designed to be robust such that if a machine fails the system keeps on running. If a end-to-end black-box model can get you most of the way there with a sufficiently low number of bugs it may be worth it to use. Time is a limited resource and being 100% correct is not necessarily better than being 99% correct and having extra features.
>The disconnect between working code and correct or secure code respectively is going to widen using this approach.
Not really. People are not going to just start ignoring bugs when they run into them because the software they are using happened to be machine generated.
>Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern
It worked fine for an image I just tried. Just like the prompt it "convert[ed] the image to a 32x32 mosaic." There was no checkerboard, but it may be worth noting that it converted transparent pixels to black.