Hacker News new | ask | show | jobs
by kaba0 1812 days ago
Formal verification techniques don’t scale, not even to your average CRUD app, let alone to an OS-scale.
5 comments

In a number of ways I'd argue that FV of an average CRUD app is a very hard problem, for two reasons:

- the requirements generally change quite frequently

- the requirements are generally a long way from being "formally verifiable"

Where I, personally, have found the most value so far using FV techniques is from modelling "tricky" things whose requirements are quite solid. A few examples from my work:

- A state machine for managing LoRa communication. The requirements in the LoRa spec are pretty good. I took those requirements and designed a state machine I could implement on a low-power device and modelled that SM in TLA+ to verify that there were no deadlocks/attempted transitions to invalid states.

- A state machine for managing the Bluetooth communication between an Android app, a Bluetooth-Iridium (satellite) bridge, and the server on the other side of the Iridium link. Similar to the LoRa case, I used TLA+ to ensure that there were no deadlocks in my proposed SM (and uhh made a number of corrections as I went). Ultimately, due to the design of the BT/Iridium gateway, the exercise resulted in the conclusion that it wasn't actually possible for this process to be 100% reliable but it was possible to detect and retry the edge cases.

- Modelling a somewhat awkward email/SMS invite flow for a mobile app. You could get an invitation over email or SMS, and your account transitioned through a few different states as both your email address and your phone number were verified. Modelling this helped ensure that your account couldn't get into a state where no forward progress was possible or into a state where you could use a half-completed account.

Thanks for the informative answer!

Just a note, I (personally) don’t consider TLA+ a FV technique — it is a really good technique that actually scales much better than formal verification programs like Agda or Coq.

It's formal verification of an abstract machine as opposed to running code. The problem is that formal methods as a field doesn't yet have a rich enough vocabulary to distinguish between its flavors. I use "code verification" and "design verification", which is a little more specific, but has its own issues.

Is TLA+ a sandwich?

> TLA+ is a formal specification language developed to design, model, document, and verify concurrent systems.
But it only proves a model, not the actual code — that’s a major distinction.
You can't write a bug and nor can you write a test if you don't even have proper spec. But it happens a lot when requirements are just ambiguous.
https://github.com/Componolit

Let us see what they have, using formal verification (Ada/SPARK).

- JWX is a library for handling JSON data and more. It is implemented in the SPARK programming language and has been proven to contain no runtime errors. As a result, JWX is particularly suited for processing untrusted information.

- The RecordFlux specification language is a domain-specific language to formally specify message formats of existing real-world binary protocols. Its syntax is inspired by Ada. A detailed description of the language elements can be found in the Language Reference.

- Formally verified, bounded-stack XML library

This is just Componolit.

Let us check the F* language:

- F* (pronounced F star) is a general-purpose functional programming language with effects aimed at program verification. It puts together the automation of an SMT-backed deductive verification tool with the expressive power of a proof assistant based on dependent types. After verification, F* programs can be extracted to efficient OCaml, F#, C, WASM, or ASM code. This enables verifying the functional correctness and security of realistic applications. The main ongoing use case of F* is building a verified, drop-in replacement for the whole HTTPS stack in Project Everest. This includes verified implementations of TLS 1.2 and 1.3 and of the underlying cryptographic primitives.

---

Given the above, what exactly do you mean that it does not scale?

Whoops, I completely forgot about libsparkcrypto (https://github.com/Componolit/libsparkcrypto) which is a "formally verified implementation of several widely used cryptographic algorithms using the SPARK 2014 programming language and toolset. For the complete library proofs of the absence of run-time errors like type range violations, division by zero and numerical overflows are available. Some of its subprograms include proofs of partial correctness.".

Features:

* AES-128, AES-192, AES-256

* AES-CBC (all supported AES modes)

* SHA-1

* HMAC-SHA1

* SHA-256, SHA-384, SHA-512

* HMAC-SHA-256, HMAC-SHA-384, HMAC-SHA-512

* PRF-HMAC-SHA-256, PRF-HMAC-SHA-384, PRF-HMAC-SHA-512

* RIPEMD-160

* HMAC-RIPEMD-160

* ECDSA, ECGDSA

It is pretty awesome. :)

Looking at the examples you've provided, I'm pretty sure the Linux kernel's codebase dwarfs their combined codebases into insignificance. Since it is very hard to not have NP-hard asymptotic complexity when formally verifying a codebase as large and as demanding (in terms of performance, etc) as the Linux kernel, I would expect that if you tried to rewrite Linux in the languages you've mentioned and formally verify the rewrite, you will be able to keep the entire AWS cloud (globally) very busy for a very, very long time. On the other hand, if you were to do the rewrite in Rust and use some hypothetical formally verified Rust compiler to type check it (which is mathematically equivalent to such a formal verification in, say, F* limited to only verifying certain properties), it'll probably type check in minutes or even seconds on the laptop I'm currently writing this reply on.

For a less hypothetical example. RedoxOS has a codebase that dwarfs Sel4 (but it is still much smaller than Linux in turn), yet it type checks with much less compute cost than it takes to formally verify Sel4. Or heck, Redleaf actually formally verifies in a tiny fraction of the time it takes to formally verify Sel4, because the Rust type checker not only significantly reduces the amount of requirements that still needs to be formally verified but also reduces the complexity to verify those requirements to linear to the codebase size due to the restrictions that the Rust type system and module system puts on valid Rust programs (with a little bit of care when writing the code). Of course, until you have a certified Rust compiler there might still be miscompiles of Redleaf and until all the features of Rust used in the Linux kernel is formally specified, we cannot do the same with the Linux kernel as for Redleaf even if somehow all of Linux got rewritten in Rust. But in Rust at least it is conceivably possible given sufficient resources and sufficient time. I very much doubt it will ever be possible without help from a sufficiently expressive and practical type system.

Oh yea. Then there's another problem. Most (all?) of the examples you've mentioned have requirements that are well specified and pretty much set in stone. The requirements fro the Linux kernel... aren't quite as obliging.

Sel4 is an OS that is formally verified.

It's security parameters are well-defined enough that you can use it to launch isolated VMs with... less secure OSes that are sandboxed from each other.

I was under the impression that SeL4 exists.
That's true, but as a microkernel, seL4 has a very limited scope. It's not in the same ballpark as, say, the Linux kernel. It's more like a hypervisor. [0]

From a quick google, seL4 has around 10,000 lines of C code, whereas the Linux kernel has around 30,000,000.

[0] https://sel4.systems/About/seL4-whitepaper.pdf

Kinda. https://news.ycombinator.com/item?id=27232401

I think CSIRO disbanding the team was a huge mistake, but in any case they weren't producing the kind of success metrics that sustained their funding.

I know of at least one other company that is picking it up.
Testing is hard. And (very) expensive.
Formal verification is even harder. And even more (very) expensive.