Hacker News new | ask | show | jobs
by nickpsecurity 3714 days ago
Interesting project mpu. I like the stuff here:

http://c9x.me/compile/doc/llvm.html

Especially the optimizations that get you the first 70%. I've been asking optimization people as I see them to do a survey of various methods to find the smallest collection that provide the hugest benefit. This will help people writing new compilers and formal verification community get a head start.

Now, back to correctness. Your project aims to be small, simple, and rigorous. That's perfect time to use methods for higher assurance software or design yours to work with them easily. So, I suggest trying Design-by-Contract w/ interface checks, some static analysis tools, or even something like Softbound+CETS that makes C safer automatically. I also usually recommend writing compilers in something like Ocaml as it prevents lots of bugs and easier to do.

Also, if you do Ocaml or safe imperative, you can always do two implementations side-by-side in the safe language and in portable C. You run tests, analysis, etc available for each one. Problems in one might be problems in the other. Regardless, though, you get the benefits of the safer language by default with a C implementation if you absolutely need it. Many of us have used this with Sandia Labs even doing it for hardware with ML and hardware language of a Java processor that came out great.

So, I challenge you to try to push the envelope by using some available techniques and tools to boost the assurance of your code. Preferably getting it off C, too, due to all the errors it (esp pointers) introduce into the compilers.

2 comments

My qbe C frontend is actually written in myrddin which is a lot like rust/ocaml.

To be honest, C is not well suited to places where there is adversarial input such as servers, but when it comes to logical correctness of code I do not see amazing benefits from languages like ocaml.

I'd usually ignore the language. Yet, given your interesting resume, there could be some practical decisions and decent code in the compiler worthy of a look in near future. ;) It indeed has some benefits from Ocaml and safer than plain C. So, good work covering that detail.

I'll try to address this, though:

"but when it comes to logical correctness of code I do not see amazing benefits from languages like ocaml."

First, why Ocaml is good for compilers. Most of this is still true and why Rust team used it:

http://flint.cs.yale.edu/cs421/case-for-ml.html

What I can't overstate, already in that link, is how much easier it is to verify the properties of ML languages. They were originally designed to write a theorem prover IIRC. SML had a formal semantics for easier modeling. It allowed for functional programming style that avoids state management issues while (esp Ocaml) still lets you get hands dirty where needed. Modifications for tracking information flow, dependent types, concurrency... all sorts of things were pretty straight-forward. Had a certifying compiler early. Relevant to logical correctness, it's easier to map functional specifications to a ML than a mess like C. This was proven when people doused Leroy et al with praise over first, verified compiler for C because it really was that hard. Likewise seL4 matching functional specs to C kernel code took man-years of effort.

So, languages like C are really hard to get logically correct. Languages like SML/Ocaml are pretty easy to get it done right if you understand the problem. Rarely the language causing your issues. A hybrid like a modified C or Modula that has ML-like advantages without C's disadvantages brings you closer to that ideal while preserving performance & control.

just to be clear, I'm not the author of qbe, but I have followed it since it started. The author uses the handle mpu.
I was briefly confused but gathered that. The only other confusing thing was that mpu works with major players in high-assurance per a comment but didn't respond to only comment (mine) about applying assurance tech. Wasn't bothered but didn't expect it either. Unusual.
I actually decided to take more time to answer your comment more throughly than others. Also, I TA'd twice the class from where you linked the article below, so I know about it :).
Great! One of my main reasons posting here is getting next generation of high assurance developers info they need plus learning from them in what's not my specialty (esp formal verification). Just hate missed opportunities given I rarely run into people that even know what the phrase means or why it matters. ;)
Any suggested reading material on writing two implementations side by side like that?
Hansen et al were first I think http://brinch-hansen.net/papers/

Notes: He sprinkles references to method in various papers. Design Principles is one I think. He says the work like COBOL compiler and RC 4000 was coded in assembler. Yet, he and compiler group wrote it by hand in ALGOL to better analyze, structure, and extend these systems. So, an ALGOL form for understanding and hand-written assembler for optimal execution.

Gypsy Verification Environment for high-assurance security - Overview and Example https://www.cs.rice.edu/~dwallach/courses/comp527_s99/gypsy-... ftp://ftp.cs.utexas.edu/pub/AI-Lab/tech-reports/UT-AI-TR-78-11.pdf

Notes: Gypsy was one of more developed methods for high-assurance security back in the day. Used in A1-class SCOMP system and more. The usage in A1 was to do two versions of system in parallel: one in Gypsy for use with provers, info flow analysis, and so on; one in actual code that would run on system. Problems in one could often tell you something about the other.

4GL combining BASIC, LISP, and C http://lost-in-triple-HD-failure :(

Notes: This was mine cuz I hated C and C++ but liked industrial BASIC's I started with. Made a dumb, but effective, translator from one BASIC to C and/or C++ (memory fuzzy there). Let me iterate rapidly & safely then have portable C + safety checks automatically in when done. Started adding custom commands 4GL-style to eliminate boilerplate. Upon finding LISP, but not really learning it, I ported tools so I basically coded a 4GL BASIC in LISP w/ per-function compilation, easier parsing, macros for easy 4GL stuff, and extraction to C or C++. Always two forms of the code in place: one for development/analysis; one for production. Crazy RAD pace w/ usually safe, efficient results. I really miss my tool as I'm too brain-damaged to recreate it now. Nim has many of its attributes, though, so there's hope. :)

Adam Chlipala's Certified Programming book/work and Bedrock http://adam.chlipala.net/

Notes: This method is quite loaded. You can use a theorem prover and dependently-typed language to express your program in a way that catches all types of problems. Usually extracts to ML that you might hand-optimize. Alternatively, reminiscient of Hansen, you can do low-level coding in Coq via Bedrock that's basically a safer, assembly language. Microsoft's Coq ASM and another's TALC typed ASM do similar stuff.

seL4 verification method http://concrete-semantics.org/

Notes: Someone told me above book teaches you what skills they used. Google AutoCorres for C-to-HOL extraction part. Anyway, their method is to develop in Haskell and C simultaneously for separation kernel. The Haskell forms the abstract specification that easily feeds into HOL for analysis of basic function and security. The C obviously is the running code. They go further than Gypsy by using AutoCorres to extract HOL from C then proving Haskell and C HOL specs are equivalent. Might look for C-level problems, too, but don't recall.

Bootsafe - Develop Forth firmware in Java for safety http://slideshowes.com/doc/437272/efficient-code-certificati...

Notes: One of military's funded projects to improve security. Open Firmware specifically. Many firmware, including OF, is in the flexible and fast but unsafe language Forth. The Java language is safer, easy to analyze, easy for experienced to compile, and has no business in firmware w/ its big-ass runtime. :) BootSafe lets security engineers code and analyze firmware in Java then extract that in a verified way to executable Forth. Online presentations are all partial and crappy w/ this most detailed I found. It's a finished product from atcorp.com, though. Notice one could manually do this, too, w/ side-by-side developments with C, C++, Java, or SPARK. Further evidence is that many embedded MCU developers code in C or C++ with mockups of MCU assembler or interfaces. Claim much better results in maintenance & defect reduction.

Another old one of mine - safe or fast? choose two (no website link)

Notes: The idea was that many development or analysis tools should be implemented in as safe and sound a fashion as possible. However, that usually meant sloooow. Development, to maximize flow, needs to give immediate results to programmer if possible. So, for each tool or app, build two versions of it with logical equivalence: safe and fast. The fast is used for day-to-day development. The safe one is used overnight or if problems show up in fast that are resisting debugging (eg might be in compiler/lib). Example combos included MLton with FLINT or CakeML compilers, Racket dialect with PreScheme/VLISP, and CompCert C and/or Softbound+CETS with GCC/LLVM. Batches run overnight to create safe versions of any changes or commits. Batches also re-run tests from fast versions to ensure they match safe versions.

Another of my proposals - C, Java, Ada/SPARK, and ML/Haskell (no website link)

Notes: There's many static, dynamic, and formal analysis tools for C that catch specific types of stuff. Same for Java including basically all the great analyzers for concurrency errors. The best pushbutton analysis is in SPARK for Ada which also has design-by-contract for interface checks. There are ML's with dependent types, easy covert channel analysis, easier formal verification, and so on with Haskell having similar tools. One of my concepts... for where it matters most... was implementing algorithm or system in at least two of these simultaneously with a max of four. In lowest common denominator and consistent way of course. Then, apply at least two of best tools for each language to system. Any flaws found in one are checked in others. Result, from eyeballing to static analysis to auto-testing, will probably be the most thorough assurance you can get without being a formal verification expert. You can do that too, though. Not sure how practical it is. I think it would be easy with 4-5 person team of specialists w/ good communication with probably interesting results.