Hacker News new | ask | show | jobs
by mafribe 3455 days ago
Summary: CakeML is the first verified optimising compiler that bootstraps.

Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's verification was carried out.

The pioneering project in this space was X. Leroy's CompCert. This was the first verified optimising compiler. More precisely, a realistic, moderatly-optimising compiler for a large subset of the C language down to PowerPC and ARM assembly code.

1 comments

Could you expand on the significance of the first, for those of us not familiar with formal verification?

Is this is a first because it is is theoretically difficult to do, or because it requires a lot of implementation time? What are some key points to read up and understand in order to properly appreciate this result, past the Wikipedia article on formal verification [1]?

Thank you in advance for any elaboration.

[1] https://en.wikipedia.org/wiki/Formal_verification

The problem with verifying realistic compilers is scale. We have known how to do it in principle since forever, and verification of toy compilers is part of textbooks on verification, such as [1], see also [2]. Realistic compilers are very complicated and Leroy's verification of CompCert took several man years for one of the world's leading compiler and verification guys. The purpose of research like CompCert and CakeML is twofold:

- Provide a verified software toolchain for programmers with a minimal trusted computing base.

- Investigate how the cost (in a general sense) of formal verification in general and compiler verification in particular can be lowered, ideally to the point that normal programmers can routinely use formal verification.

The advance that CakeML makes over CompCert is bootstrapping: CakeML can compile itself, while CompCert (being a C compiler written in Ocaml) can't. Simplifying a bit, bootstrapping lowers the trusted computing base.

Maybe Leroy's [3, 4] are good starting point for learning about this field.

[1] T. Nipkow, G. Klein, Concrete Semantics. http://www.concrete-semantics.org/

[2] A. Chlipala, A verified compiler for an impure functional language. http://adam.chlipala.net/papers/ImpurePOPL10

[3] X. Leroy, Verifying a compiler: Why? How? How far? http://www.cgo.org/cgo2011/Xavier_Leroy.pdf

[4] X. Leroy, Formal verification of a realistic compiler. http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf

No it goes further than that. They embedded assembly languages & many aspects of computation in HOL then built their compiler. The thing goes straight from logic to assembly with the theorem prover being the TCB outside the specs themselves. Whereas, CompCert was specified in Coq, would probably be extracted to an ML, and then that whole pile of code (hopefully verified to assembly) would do the job. Unless they're doing all the compiles in Coq itself w/ its checker. This is the part I could be really wrong on.

The TCB reduction is huge. Also, seL4 organization built the Simpl embedding of C in that to do "translation validation" (due to Jared Davis) of it straight to or matched against assembly. Skips the need for a CompCert-style, verified compiler altogether. Myreen et al's techniques were also used to verify theorem provers and now hardware.

So, the CakeML effort and its effects are huge. Maybe more so than CompCert given the flexibility & fact that it's a proprietary product now whereas Myreen et al's stuff is open. That's what I said back when I saw it. The prediction was confirmed as COGENT was built on the same technology with amazing results so far in cost of verification:

https://ts.data61.csiro.au/projects/TS/cogent.pml

Thank you.
The problem with optimizing compilers is that you're changing what the user expects to come out. This is very dangerious because you need to be able to prove that what you've generated will work as the non-optimized version.

I know of a few programs that when compiled with -O2 work fine but break with -O3. This is because some optimizations that are applied are just not what the programmer expected or in the older days just broken working code.

Formally verifying the output is difficult because you need to prove the same operation is happening both times even if they are extremely different in what they are doing. I'm assuming that's where the difficulty comes in.

Hence the Vellvm project and its ilk:

https://www.cis.upenn.edu/~stevez/vellvm