Hacker News new | ask | show | jobs
by jp57 514 days ago
In grad school I took a formal methods class where we proved properties about programs that completely changed how I think about bugs. The main things I took from the class were

1. Correctness of a program is distinct from its performance.

2. Program correctness can be proven.

3. Optimizing for performance often makes it harder to prove correctness.

I do not actually use formal methods in my work as a developer, but the class helped improve my program quality nonetheless. Now I generally think in terms of a program being correct rather than having no bugs. Technically these are the same thing, but the change of language brings a change of focus. I generally try to use the term "error" instead of "bug", for an incorrect program.

My strategy is to write the simplest correct version of the program first, convince myself that it is correct, and then optimize, if necessary without regressing on correctness. I generally use tests, rather than formal proofs, though, so of course there is still the possibility of uncaught errors, but this strategy works well overall.

Thinking this way also gives me guidance as to how to break down a program into modules and subprograms: anything that is too big or complex for me to be able to reason about its correctness must be subdivided into pieces with well-defined correctness.

It also has clarified for me what premature optimization means: it is optimizing a program before you know it's correct.

(EDIT: fixed "reason about its complexity" to say "reason about its correctness" in the penultimate paragraph.)

6 comments

Correctness of a program is usually distinct from performance.

One obvious exception is code processing secret data. There, performance variance creates observable side-effects (timing side channels) which can be used to determine the secrets' values.

Another is any sort of hard real-time system, where performance is critical to correctness. For example, a brake-by-wire system that took 10 seconds to respond to the pedal being pressed would be incorrect, because of poor performance.

Otherwise, I agree. There might be some other exceptions, but striving for correctness first is a good way to write code.

Even for hard real-time systems, focusing on correctness first is often the right way to go. At least in my experience, it's typically been much easier to make a correct system fast than a fast wrong system correct. With some particularly delightful (read: catastrophic) results when some folks really wanted to push their fast wrong code (fixed years ago, fortunately, and they had a good corporate culture change not long after so no point in naming and shaming).
I agree. But performance can't be ignored, and for some systems like those I mentioned it's not distinct from correctness. Performance doesn't always mean "as fast as possible", e.g. for systems dealing with secret data it means "without leaking information via side channels", for something like petting a window watchdog it means "slow enough not to pet before the window opens, fast enough not to pet after the window closes".
Achieving Correctness is really satisfying. however it is hard and difficult. IMOH this does in general polarize the scene (proving fanatics on one extreme and the other side who are not even testing the code) IMHO flushing out what you are designing does help and goes along the way of having fewer bugs. one old relatively yet easy and accessible formal toolkit which helpful in flushing out process is Z notation . one of the accessible books, old yet an easy read and rewarding is

"Software Development with Z. A practical approach to formal methods in software engineering"

https://archive.org/details/softwaredevelopm0000word/page/n1... .

there are other notations developed later. but its simplicity and easiness even while scribling on paper or word processor gets me back to using it every now and then

I've a good track record of having my programs work without bugs, I don't think it's too hard. The way I work is to restrict myself to using building blocks that I know work well and produce correct results. For example: using state machines, never breaking out of a loop, tackling all the edge case before the body, using simple data structures, don't use inheritance or exceptions, don't interact with files natively, don't use recursion, etc. etc.

When I face a programming problem I map the solution to those simple building blocks, and If I can't I try to change the problem first.

Formal methods are hard if you want to prove the correctness of a hard algorithm, but you can almost always do without a hard algorithm and use something really basic instead, and for the basic things you don't need formal methods.

The people who write the most bugs in my experience do it because they don't fully understand the building blocks they're using, and rely on other things for correctness like the type checker or unit tests. They view code as a stochastic process, write code they don't fully understand and have accepted the resulting bugs as a fact of life.

It is a great time to work with proofs and formal verification; so many nice tools with different goals like idris, agra, lean, coq, tla+, etc, but also unit and system tests, however generally clients don't pay for that; they pay for sloppy and buggy.
> Program correctness can be proven.

It can be proven, in broadly the same sense as all of the atoms in your body can simultaneously "decide" to exist an inch to the right due to quantum field theory.

It is not practical to prove correctness for the vast majority of programs, and there are programs that are demonstrably incorrect, cannot be made correct, and yet are useful and still function anyway (e.g., closing TCP connections cannot be done "correctly").

> It is not practical to prove correctness for the vast majority of programs,

That is the excuse i hear a lot from software developers, when everything they do is to test the expected behaviour of a program, without any edge case.

And this is also the reason why iMessage and WhatsUp are full of one click exploits.

The comment you were replying to is about formal proofs of correctness, probably using tools like Rocq (formerly Coq) and TLA+.

These tools can take an extreme amount of time to use and require specialised skillsets that are difficult to hire for. They go far beyond standard software engineering practices like unit and E2E testing. It is genuinely, seriously not practical to verify the average startup's SaaS web app with TLA+; you'd go bankrupt.

Someone who's used them more than me might be able to comment on exactly how hard it would be, but I felt it was unfair to you not to explain.

It is possible that the person you replied to wasn't confused by that. I've certainly come across the type of developer who'd deride the lack of ability to formally test their code then, in the next breath, click through the happy path of their code and send it.
TLA+ and Coq are too hard for most SAAS applications. Tried FizzBee (https://fizzbee.io/tutorials/getting-started/) and P (https://p-org.github.io/P/whatisP/)? These seems quite practical.

Coq/Rocq are obviously much harder to use, and I mostly see embedded systems engineers use them. TLA+ is somewhat easier, and being used in a number of cloud infrastructure companies (AWS, MongoDB, Confluent etc).

Obviously, if you are building a simple CRUDL app, these tools don't help at all.

Oh yeah, just prove it correct. Simple

Wait, code is not simple. Are the requirements simple? Oh wait, in reality they aren't simple either.

The complexity of the formal specification must ramp up with the complexity of the code, because the task being asked is complex.

So where does the debugging end if the specification is wrong?

How do you prove the requirements are correct?

Unit testing is helpful but wasn't the panacea, because the most important bugs are in integration.

I'm also going to guess that formal proofs will have hellacious issues with system boundaries. They can't prove correctness of the state on the other end of the pipe.

So you overcome that, time and money. What happens when the requirements/specification changes? How much of the verification is invalidated?

Proof systems need to prove their economic validity in the he software realm. Heck, where are the proof systems for subsections of the Linux kernel or other vitally important inner loops of computing? Those actually have stable specifications. Weren't go multitasking formally verified/provably correct in some sense? Why haven't correctness tools tackled other things?

Correctness is also distinct from applicability/usefulness. A program that lacks some important edge cases is marginally more useful than a program that takes impractical amount of time for most inputs. So "Correctness of a program is distinct from its performance" doesn't imply "performance is optional".
When you convince yourself of the program correctness, are you using techniques from the class?

Any advice on reasoning about correctness?