Hacker News new | ask | show | jobs
by atomicnature 975 days ago
A few years back, I was trying to find out how to reduce mistakes in the programs I write.

I got introduced to Lamport's TLA+ for creating formal specifications, thinking of program behaviors in state machines. TLA+ taught me about abstraction in a clear manner.

Then I also discovered the book series "software foundations", which uses the Coq proof assistant to build formally correct software. The exercises in this book are little games and I found them quite enjoyable to work through.

https://softwarefoundations.cis.upenn.edu/

3 comments

Code correctness is a lost art. I requirement to think in abstractions is what scares a lot of devs to avoid it. The higher abstraction language (formal specs) focus on a dedicated language to describe code, whereas lower abstractions (code contracts) basically replace validation logic with a better model.

C# once had Code Contracts[1]; a simple yet powerful way to make formal specifications. The contracts was checked at compile time using the Z3 SMT solver[2]. It was unfortunately deprecated after a few years[3] and once removed from the .NET Runtime it was declared dead.

The closest thing C# now have is probably Dafny[4] while the C# dev guys still try to figure out how to implement it directly in the language[5].

[1] https://www.microsoft.com/en-us/research/project/code-contra...

[2] https://github.com/Z3Prover/z3

[3] https://github.com/microsoft/CodeContracts

[4] https://github.com/dafny-lang/dafny

[5] https://github.com/dotnet/csharplang/issues/105

The issue is that modern software fails because it's part of a complex system with many moving parts, rather than, it is inherently complex at the source-level.

Certain sorts of algorithmically complex development (games, cars, medical hardware, etc.) would benefit from a 'closed-world verification' -- but that's not most software, and they have alternatives.

'Code correctness', including unit testing, ends up being a big misdirection here. What you need is comprehensive end-to-end tests, and instrumentation to identify where failures occur in that end-to-end.

The effort to source-level-check source-level-code is largely a huge waste of time and creates an illusion of reliability which rarely exists.

> The issue is that modern software fails because it's part of a complex system with many moving parts, rather than, it is inherently complex at the source-level.

A reminder of Gall's law:

> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.[8]

* https://en.wikipedia.org/wiki/John_Gall_(author)

Strong disagree.

> The issue is that modern software fails because it's part of a complex system with many moving parts, rather than, it is inherently complex at the source-level.

The choice to run a system as many different moving parts is a decision taken by the team in order to avoid failure.

> Certain sorts of algorithmically complex development -- but that's not most software

It's all software.

> 'Code correctness', including unit testing, ends up being a big misdirection here. What you need is comprehensive end-to-end tests, and instrumentation to identify where failures occur in that end-to-end.

No and no. I have comprehensive end-to-end tests. They take forever, don't fit into RAM (for some services I need to run them on my home PC because my work laptop only has 16GB), and most importantly: they show that the code is not correct. Now I have to change incorrect code to correct code (while not breaking any public interfaces. I wish my predecessors did not put incorrect code into the live system.

I'm not really sure what you're saying here -- the parent commenter is talking about closed-world source-level code verification.

My claim is that these techniques help in relatively few areas of software development. In the main, software is built with open-world interactions across a range of devices (network, disk, etc.) where those interactions dominate the net failure modes of the system.

In the latter case, source-level verification is not only of no help, but it's a huge resource misallocation. You end up writing tests that just create the illusion of success; and then the whole thing constantly falls over.

Spend all that time instead in designing tests that do fit into ram (eg., sample your datasets); and instrumentation that does reveal errors (eg., sampling of memory, requests, etc.).

One of my first software jobs was writing a reverse proxy -- one of my co-devs wrote unit tests that simply established the in/out of various functions was "as expected". Pretty useless -- the issue is whether the proxy actually worked.

Likewise most source-level correctness efforts are 'testing theatre'.

Devs are looking for cargo-cult solutions they can copy/paste. No, testing software is an actual area of development -- and you need to develop tests, not cargo-cult

Tests only show that it is correct for the sets of values and code paths you exercise. It's quite possible for other aspects to be incorrect, and this is what theorem provers like Coq help with.
If you have an incomplete or buggy specification Coq won't actually prove the absence of bugs.
This is true but mostly meaningless in practice. I have never found the people who say this every time formal verification comes up to be the same people who actually work in formal verification. The reality is that it's far far harder to mess up a formal spec that you actually have to prove a concrete instance of, than it is to mess up the code. With a buggy spec, in the course of trying to prove it you'll arrive at cases that seem nonsensical, or be unable to prove basic lemmas you'd expect to hold true. Formal spec bugs happen, but they're quite rare. Usually even when there are edge cases that aren't covered, they will be known by the authors of the spec due to the dynamics I mentioned earlier (cases encountered during the proof that are clearly being solved in ways that don't follow the "spirit" of the spec, only the letter). Of course, "informal" specs, or paper proofs, will have subtle bugs all the time, but that's not what we're talking about here.
"Code correctness is a lost art."

No, it really isn't. It's doing better now than it ever has... and I mean all the negative implications of that. It was not an art that was ever "found" in the sense you mean.

Note that the Z3 SMT solver was written by Leonardo de Moura, who also is the lead dev of Lean 4. Not a coincidence (-;

Lean 4 seems to be used in production at AWS: https://github.com/cedar-policy/cedar-spec/pull/138

Or using F* and then generate F# code, https://www.fstar-lang.org/
Formal verification is often not the best tool for ensuring code correctness from an ROI perspective. Things like unit tests (including property based tests) and ensuring 100% code coverage often achieve adequate results with less effort.
The key point is that most software don't need correctness to the level formal verification provides. There's a subset of software however, for which there's no substitute for a formal verification process.
Additionally, formal verification usability is an area of constant research, and the set of software for which it is the best ROI increases over time.
If you can specify your program (or a part of it) well enough to prove it correct, you can also use that specification for property-based testing.
Ada is gaining popularity in safety critical systems
Ada/SPARK is already popular in safety critical systems
Have you looked into Idris2 at all. While looking into these theorum provers, it always felt like they had an impedance mismatch with normal programming.

Idris2 portends to a general purpose language that also has a more advanced type system for the theorum proving.

https://github.com/idris-lang/Idris2

That's interesting; no I wasn't aware of it. Will check it out sometime, thanks.
I had the same positive experience with Software Foundations.

There is another book somewhat derived from it (if I understand correctly) using Agda instead of Coq: https://plfa.github.io/

I haven't had the chance to go through it yet, but it's on my list - I think Agda (and as mentioned by another commenter, Idris) is likely to feel more like a programming language than Coq.