|
That's certainly a problem, but it's not the biggest problem. Writing a formal spec is no harder than writing tests (and arguably easier). It doesn't have to be perfect -- the more correctness conditions the better. Of course, a complete spec -- one that says, anything that satisfies this is a correct implementation of the system -- is best, but it's really not required to start getting real benefits. I'd say that the biggest problem is the intractability of verification. Xavier Leroy has said that his style of verification -- end-to-end (i.e., global correctness conditions verified all the way down to source- and machine-code) and employing formal proofs -- is only suitable to the smallest and simplest programs (he claims it's suitable to programs hundreds to thousands of lines long, and even he had to take a few shortcuts in CompCert, e.g., he eschewed some termination proofs). Model-checking fares only a little better, and static analysis -- which scales to arbitrarily-sized programs -- can only verify specific, and mostly local correctness properties. Other approaches can tractably verify global correctness conditions, but only against high-level descriptions of the algorithm/system, and don't reach all the way down to code. AFAIK, the most robust, and still scalable, formal verification employed in real-world programs of considerable size is that employed by Altran UK. They verify global properties against a high-level spec (verified mostly with model-checking, and possibly some proofs), plus local properties at the code level (verified mostly automatically with SMT solvers with a small number of manual proofs), but leaving a gap in between. This could very well become easier and scale well, and could become affordable (as it automates relatively well) yet still extremely useful, as end-to-end verification is hardly ever required. They're using Z for the high-level spec and Ada SPARK for the code-level spec, but other tools could be used (e.g. TLA+/Alloy for the high level, and JML/ACSL/Clojure Spec/Spec# for the code level). I've personally successfully used refinement mappings in TLA+, which allow you to move between different levels of the system description, with the different specs as well as the refinement mapping itself checked with a model checker. This is indeed harder than just checking a single description, but still in the realm of affordability. Taking refinements all the way down to the source level is still far from affordable, though. |
This turned out to be a premature optimization, actually! The CakeML compiler did full functional correctness proofs for their register allocator and report that it wasn't significantly harder than Compcert's translation validation.
> Writing a formal spec is no harder than writing tests (and arguably easier).
This, I'm much less sure of. IME, the really hard thing about correctness proofs is that specifications and proofs often require genuinely new ideas that didn't occur in the code.
For example, the spec of something like a sorting routine is that the output returns a list in increasing order, and that the output is a permutation of the input. The concept of a permutation doesn't occur explicitly in the implementation of the sort, and so the need to make this concept explicit can stop learners dead in their tracks.
I taught Agda (a dependently-typed programming language) to undergraduates, and found that Agda-the-programming language was actually easier to teach than Haskell (the IDE is better and there's less magic). But Agda-the-proof-assistant was hard to teach, mostly AFAICT because I don't know how to teach having mathematical ideas well.