|
They're barely useful for low assurance. Just read the Csmith paper testing compilers to see the scope of the problem. They solution to what they're really worried about will require (a) a correct compiler, (b) it written in cleanly-separated passes that are human-inspectable (aka probably not C language), (c) implemented with correctness checks to catch logical errors, (d) implemented in safe language to stop or just catch language-level errors, (e) stored in build system hackers can't undetectably sabotaged, (f) trusted distribution to users, and (g) compiled initially with toolchain people trust with optional, second representation for that toolchain. Following Wirth's Oberon and VLISP Scheme, the easiest route is to leverage one of those in a layered process. Scheme, esp PreScheme, is easiest but I know imperative programmers hate LISP's no matter how simple. So, I include a simple, imperative option. So, here's the LISP example. You build initial interpreter or AOT compiler with basic elements, macro's, and assembly code. Easy to verify by eye or testing. You piece-by-piece build other features on top of it in isolated chunks using original representation until you get a real language. You rewrite each chunk in real-language and integrate them. That's first, real compiler that was compiled with the one you built piece by piece starting with a root of trust that was a tiny, static LISP with matching ASM. You can use first, real compiler for everything else. Wirth did something similar out of necessity in P-code and Lilith. In P-code, people needed compilers and standard libraries but couldn't write them. The could write basic system code on their OS's. So, he devised idealized assembly that could be implemented by anyone in almost no code and just with some OS hooks for I/O etc. Then, he modified his Pascal compiler to turn everything into P-code. So, ports & bootstrapping just required implementing one thing. Got ported to 70+ architectures/platforms in 2 years as result. The imperative strategy for anti-subversion is similar. Start with idealized, safe, abstract machine along lines of P-code with ASM implementations. Initial language might be Oberon subset with LISP or similar syntax just for effortless parsing. Initial compiler done in high-level language for human inspection with code side-by-side in subset language for that idealized ASM. It's designed to match high-level language, too. Create initial compiler that way then extend, check, compile, repeat just like Scheme version. The simple, easy code of the initial compilers and high-level language for final compilers means anyone can knock them off in about any language. That will increase diversity across the board as many languages, runtimes, stdlibs, etc are implemented quite differently. Reproducible build techniques can be used on the source code and initial process of compilation if one likes. The real security, though, will be that many people reviewed the bootstrapping code, the ZIP file is hashed/signed, and users can check that source ZIP they acquired and what was reviewed match. Then they just compile and install it. |