Hacker News new | ask | show | jobs
by Hercuros 2501 days ago
First of all, I would agree with you that full formalization of program correctness using dependent types is generally extremely costly in terms of time and money, and that there are often other formal methods which might get you results that are "good enough" while requiring much less effort and expertise. However, I do think that there are some inaccuracies in some of the statements you are making about proof assistants based on dependent types.

One major benefit that proof assistants have over other formal methods is that they can be used to prove essentially arbitrary properties of programs, which is not true of any automated methods. Basically, if we as humans are able to convince ourselves that a program has a certain property, then we could also show this using a (dependently-typed) proof assistant.

(Yes, theoretically all proof systems have limitations due to Gödel’s incompleteness theorem, but this limitation applies just the same to ALL formal methods. Furthermore, you’re not that likely to run into it in practice unless you are doing metatheory, and even there many interesting statements CAN still be shown.)

Consider the formally verified C compiler CompCert, where the assembly output of the compiler has been proven to have the same behavior as the input source code. A proof like this simply cannot be done using automated methods currently, and requires human assistance. This kind of correctness proof is also highly non-compositional, since the correctness proof of a compiler does not mirror the structure of the compiler very closely. Hence this also invalidates your point about dependent types not being able to handle non-compositional properties very well. There are plenty of other examples, like the seL4 microkernel.

I also do not understand what bearing computational complexity results have on humans manually formalizing their informal intuitions about program correctness in a proof assistant. Humans do not work on arbitrary problems where they have no intuition at all about why a program is correct, hence referring to worst-case computational complexity results does not make much sense here. In fact, if you have no intuition about why a program works, then how did you manage to write it in the first place? You surely did not write random garbage in the hopes that you would get a working program.

You might say that there are programs you can check using a model checker, but which humans cannot reasonably keep in their heads or form an intuition about, and therefore these proofs cannot formalized in a proof assistant. But what you can do is implement a model checker in your proof assistant and then show that the model checker is correct. After you've done that, you can simply use the model checker to do your proofs for you. And of course it should be possible to formalize the model checker itself, since the people that wrote the model checker must have some intuition about why it works and produces the correct results. In this sense, proof assistants subsume all other formal methods, since they can be "scripted" to include them.

Furthermore, it is generally not the case that you have to rewrite your program in case you want to prove some more properties about it. You may be thinking of a programming style where the type of a program encodes all the properties you are interested in, but you can avoid this style where necessary, by separating your program and its correctness proof.

1 comments

> One major benefit that proof assistants have over other formal methods is that they can be used to prove essentially arbitrary properties of programs, which is not true of any automated methods.

It is true of model checkers.

> There are plenty of other examples, like the seL4 microkernel.

Yep, there are plenty of such examples, I'd say 2 or 4, and they're all less than 10KLOC, i.e. 1/5 of jQuery.

> I also do not understand what bearing computational complexity results have on humans manually formalizing their informal intuitions about program correctness in a proof assistant.

Because humans can't outperform Turing machines.

> Humans do not work on arbitrary problems where they have no intuition at all about why a program is correct, hence referring to worst-case computational complexity results does not make much sense here.

For one, the results aren't just worst-case. One of the most important ones is on parameterized complexity. Second, problems that are very hard in the worst case have rarely been proven easy in the "real world" case, unless the real world case was somehow categorized in a neat subset, and this isn't the case here. It's possible that we're usually far from the worst case, but if so, automated methods could also exploit that, and still be better than manual ones in most cases.

> In fact, if you have no intuition about why a program works, then how did you manage to write it in the first place? You surely did not write random garbage in the hopes that you would get a working program.

This is a common assumption, but a wrong one: https://pron.github.io/posts/people-dont-write-programs

> In this sense, proof assistants subsume all other formal methods, since they can be "scripted" to include them.

The same can be said about all other formal methods. Indeed, various sound static analysis tools like Frama-C or optionally fall back to manual proofs when needed. In the end, though, in practice deductive proofs don't scale as well as other methods. It's a good tool to have in your toolbox, but you don't want it as your default. Your model checker could just as easily ask for help from a proof assistant. So it's not the case that any one method subsumes all others, but you could combine them. Which is why dependent types are problematic, as they make these combinations harder than, say, contract systems, that are flexible in the choice of verification method.

> You may be thinking of a programming style where the type of a program encodes all the properties you are interested in, but you can avoid this style where necessary, by separating your program and its correctness proof.

True, but that's not the style discussed here, of dependent Haskell or Idris.

I agree that verifying programs using dependent types is impractical for large programs. I think that the reasons are mostly practical rather than theoretical, however. There's too little benefit for too great a cost if you want essentially 100% correctness guarantees for every single part of a large, existing program.

Humans regularly do "outperform Turing machines." Most of the complicated mathematical proofs that have been produced by humans, cannot currently be automatically produced using an algorithm in reasonable time. Pick almost any non-trivial mathematical theorem, and you would be hard pressed to write a program that could prove it in a reasonable amount of time given just a list of the axioms. In theory, humans might not be able to outperform TMs, but in practice they regularly do, as attested to by the large amounts of mathematical proofs that we have produced by hand.

Model checkers also outperform humans for some tasks, but then again this is because human intuition has been used to write the model checker in the first place. My point is that using theoretical computer science results to talk about the difficulty of finding mathematical proofs in real-world cases is a bit of a non-sequitur. Yes, the Linux kernel would be practically impossible to verify completely using semi-automated tools, because it would take too much time and effort. But is this because of some theoretical result about the model checking problem? I don't think so.

One difference between having a model checker call a proof assistant and using a proof assistant to implement a verified model checker is that if you write a verified model checker implementation, you can be almost 100% certain that it's bug-free, because it creates proof terms that can be independently checked by the (ideally small and simple) proof kernel of the proof assistant. You don't have to trust that the model checker has been implemented correctly. This difference does not matter that much in practice, however, since model checker implementations probably receive a lot of testing and are quite trustworthy.

You're right that I am mostly talking about Coq, and that a different style tends to be used in Haskell/Idris. Note that dependent Haskell is actually logically inconsistent for backwards-compatibility reasons anyway, so there's not much to lose there by cutting a few corners for pragmatic reasons. It is not necessary to verify every single part of the program. Dependent types in Haskell gives you the freedom to do more. You can choose not to use that freedom where it doesn't make sense.

> Most of the complicated mathematical proofs that have been produced by humans, cannot currently be automatically produced using an algorithm in reasonable time.

That's true, but humans have not done that in a time frame that's even remotely reasonable for verifying programs. Automatic verifcation methods outperform humans by orders of magnitude, on average, as far as program verification is concerned. My point is that if it is true that real-world programs are far from the worst case -- and note that we do not currently have much evidence to support that belief, because we have not been able to regualrly and affordably verify programs with deductive proofs -- then automated tools will be able to exploit that as well (once we know what "that" is, assuming "that" exists at all).

> My point is that using theoretical computer science results to talk about the difficulty of finding mathematical proofs in real-world cases is a bit of a non-sequitur.

Why? That's the entire point of the entire field of computation complexity. But don't think I'm pessimistic. I just think software verification is an extremely complex subject, and anyone who says they can certainly tell now which approach will "win" simply does not know what they're talking about.

As Fred Brooks wrote, realizing the limitations and understanding the magnitude of the challenge is not pessimistic. Physicists don't feel that the speed of light is a non-sequitur. Quite the opposite: understanding the limitations and challenges allows us to focus our efforts where they'll do the most good, rather than bang our heads against the wall just because that's what we happen to know how to do.

> Dependent types in Haskell gives you the freedom to do more. You can choose not to use that freedom where it doesn't make sense.

Given that only 0.01% of programmers use Haskell in production I don't really care what approach Haskell uses. But as someone who takes verification seriously, I think contract systems give you the freedom to use deductive proofs -- just as dependent types do -- as well as other methods, which currently have proven much more practical and useful in the real world.

What I meant by non-sequitur is that I agree with your premises (that certain problems are computationally intractable or undecidable, and this is good to know), and I agree with your conclusion (full verification using deductive methods/dependent types is unfeasible for large software projects). However, I do not agree that the conclusion is true because of the premises.

That is, I don't think one can say that the reason why human-aided verification is unfeasible for some projects (say, the entire Linux kernel) is because of theoretical limitations. Yes, verifying the Linux kernel would take too much time, but not because of any specific theorem in theoretical computer science.

The reason why I am emphasizing this point is that I believe that referring to theoretical results in discussions like this makes it seems as though there is some kind of fundamental limitation that prevents people from verifying the Linux kernel in a proof assistant. Even though it is quite a jump to conclude that from theoretical results, especially when human intuition is involved.

I would say the reason that people haven't verified a program the size of the Linux kernel is that it's just too much work. Just like if I were to try to write the Linux kernel from scratch on my own, it would take me years and years. But not because of a mathematical theorem. Some things are just a lot of work without there being a deep reason behind it.

We can go much further with deductive verification combined with automated verification compared to automated verification alone, precisely because human intuition can be used to guide the automated tools.

EDIT: To give another example, I would have no problem with the following reasoning:

A SAT solver that is efficient for arbitrary SAT problems cannot be implemented, because SAT is NP-complete (assuming P =/= NP).

However, I would have a problem with the following line of reasoning:

Intel CPUs cannot be fully verified by people because SAT is NP-complete.

> That is, I don't think one can say that the reason why human-aided verification is unfeasible for some projects

That's not exactly what I said, or at least not what I think. I think that full, 100% confidence verification (the method is irrelevant) has some serious challenges as suggested by those theorems, and that it is, therefore, unjustified to believe that there would be a clear solution any time soon. One way to circumvent those challenges is, for example, to reduce the level of confidence.

> Yes, verifying the Linux kernel would take too much time, but not because of any specific theorem in theoretical computer science.

I'm not saying it definitely is, but I don't think anyone can say it isn't, either (well, technically, a theorem won't likely apply to a specific piece of code, except as a sample from some distribution).

> The reason why I am emphasizing this point is that I believe that referring to theoretical results in discussions like this makes it seems as though there is some kind of fundamental limitation that prevents people from verifying the Linux kernel in a proof assistant.

I think that the fundamental limitation (about the general case) suggests great challenges. And again, the specific method doesn't matter as long as it's sound. A proof assistant and a model checker are both bound by those results, as both are sound.

> Even though it is quite a jump to conclude that from theoretical results, especially when human intuition is involved.

I'm not sure human intuition is very relevant. We've had human intuition and proof assistants for the past 40 years, and still we haven't been able to prove software beyond a very small size. So I think that at the very least we should consider the possibility that if we can jump two feet up, then even though it seems like jumping all the way to the moon is a matter of degree, there just could be some fundamental obstacles in the way. I'm not saying we can't overcome them, but I'm saying we can't assume they're not there.

> But not because of a mathematical theorem. Some things are just a lot of work without there being a deep reason behind it.

OK, but people have written the Linux kernel, and that's a lot of work, too. Don't you think that if a mathematical theorem says that, in the worst case, A is much harder than B, and that in practice, A is, indeed, much harder than B, then maybe theory has something to do with it?

> We can go much further with deductive verification combined with automated verification compared to automated verification alone, precisely because human intuition can be used to guide the automated tools.

I think that's speculation and that it's unjustified by either theory or practice. It's unjustified by theory because if human intuition can consistently prove successful (to the point of being an industry technique), then whatever method is used suggests some recognized easier-than-worst-case subset, and so automated tools could take advantage of that, too.

It's unjustified by practice because you could have said the same thing thirty years ago, and here we are now, and automated tools are doing better on average (it's kind of complicated to compare, because there are cases where either one is much better than the other, but overall, when we look at "verification throughput", automated methods verify more software more cheaply). It is true that automated tools have some limitations that we haven't been able to break through, but so do manual proofs.

You could indeed make the case that the intractability/undecidability-type theorems suggest that the problem of program verification is inherently difficult, and not something that can be solved in most cases using a trick. It requires some deep thinking in many cases, the same kind of thinking required to find complex mathematical proofs.

Currently humans are much better at proofs that require this kind of deep thinking. Automated tools are better for proofs that are based on a simple structure, like a large enumeration of cases (as in SAT solving and model checking, oversimplifying things a bit). This is the very reason they are automated: the reasoning performed by them is so simple/well-structured that we are able to write a program to do it for us.

However, many proofs fall outside of this easy-to-automate subset, and require human assistance. Human intuition does consistently do better than automated tools, but there is no single easily recognizable "trick" or "method" to it, which makes it hard to implement in a tool.

In my view, the reason that theoretical results do not apply very directly to human reasoning is precisely that human reasoning is "intuitive" and hard to capture in a program. We are not really Turing machines in any practical sense. We are more adept at proving things (perhaps creating automated tools to help us along the way) than any known algorithm.

I absolutely agree that automated tools make much more sense for your average industry application, but if you want to be able to use the full power of human intuition, instead of restricting yourself to program properties that are easy to prove automatically, then there is no way around manual proofs.