The idea is that, if you can describe the problem (here, sorting) with a few lines, you don't have to prove that the Idris code is correct, because the machine do it for you. You can throw that sort function without even looking at the code in any critical space mission you want and, if the premises are true, then it will always correctly sort.
You can imagine 3 kinds of code :
- a code that describes the problem (that is both human readable and machine readable)
- a code implementing the solution (here, your Python script)
- a code proving the implementation correctly answers the problem (which would be math if you had to prove your Python code on a whiteboard)
The idea is more about reusing and sharing trusted code, re-running the proof on your own system to make sure it's legit, and move on. You don't need to look at the proof to use the sort function, you would only look at the code that describes the problem and make sure it's the right one for your problem at hand.
>The idea is that, if you can describe the problem (here, sorting) with a few lines, you don't have to prove that the Idris code is correct, because the machine do it for you
The machine can't show that your specification is correct, only that it's self consistent. What you're doing is writing a proof that the compiler will use to verify your code. However, if you make a mistake in your proof, then the compiler will happily verify the wrong thing.
The argument is that it's much harder to spot a mistake in a 300 line Idris proof than it is in a 5 line Python implementation.
> However, if you make a mistake in your proof, then the compiler will happily verify the wrong thing.
I don't use theorem provers in my to day to day tasks, but I had the 101 course at University. Back then I wrote basic proof with the tools, and if your problem is correctly stated and entered in the system, you simply CAN'T arrive to an invalid proof. The only way to fool the compiler is to use logical shortcuts, which are clearly defined in term of language keywords, so you know exactly where is the weakness in the proof, and look for them.
Edit : I don't know every theorem provers out there, so to give a bit more context about my experience, it was with the Coq theorem prover
> if your problem is correctly stated and entered in the system, you simply CAN'T arrive to an invalid proof.
How is that different from: If your problem is correctly stated, and correctly coded into in the system using x86 machine language, then you can't arrive at a bug!
At least in Coq, there is no "bug" when you write a proof, it uses mathematical notations and logical rules to conclude a fact. Because of that you cannot state all the problems you would be able to state with x86. What I meant is that such theorem provers don't give you false-positive : if it tells you your proof is correct, then there can't be a "bug" in your reasoning
I think you missed the point I was making which is that you still have to know that you're proving the right thing.
It's entirely possible to have a but in the proof itself, at which point your program is going to incorrect.
Reading and understanding the Idris proof is actually more work than understanding the untyped Python version, therefore it's actually harder to say whether it's correct or not in a semantic sense.
> I think you missed the point I was making which is that you still have to know that you're proving the right thing.
Right now, it's a missing part in many theorem prover systems (but I didn't do exhaustive researching, so it's more my point of view) : a code to succinctly describe, or state, the problem you are trying to solve. For exemple, for a sorting problem, you would state the problem in English : after sorting, for any element E, the next element in the sequence should be lower.
> It's entirely possible to have a but in the proof itself, at which point your program is going to incorrect
The premise of theorem provers is that if the problem is correctly stated, then a proof of a solution passing the prover's compiler and a few human reviews is even more unlikely to have a bug.
> Reading and understanding the Idris proof is actually more work than understanding the untyped Python version, therefore it's actually harder to say whether it's correct or not in a semantic sense.
I would be curious to see a Python proof of the Python sort function. I mean an actual logical and mathematical proof, not a unit test or fuzzy test. It would imply to create a library with a DSL around math and logic.
“Saying it’s correct in the semantic sense” = proving the code.
They found a bug in Java’s binary search after 10 years of it going unnoticed, so I think you’re overestimating your own ability to prove code correct in your head.
Generative testing/spec is absolutely useful. It doesn’t subsume static typing and much less theorem proving though, and you can QuickCheck your pre and postconditions in static languages as well.
I don't find that to be a convincing argument in the slightest. The type system in Java also failed to catch the bug, and there's no guarantee that you wouldn't end up with an error in a specification using a more advanced type system that could actually encode that constraint.
The thing to realize is that theorem proving is not a business goal in most cases. What you care about is being able to deliver software that works well in a reasonable amount of time. Sure, you might end up with a binary search bug in your code that goes unnoticed for 10 years, but clearly the world didn't stop because of that, and people have successfully delivered many projects in Java despite that bug that work perfectly fine.
> The type system in Java also failed to catch the bug
Java is not a theorem prover, and your comment was about theorem provers.
> you wouldn't end up with an error in a specification
Again, completely orthogonal to your argument. If you have an error in specification tests won't help you because you'll test the wrong thing.
> The thing to realize is that theorem proving is not a business goal in most cases. What you care about is being able to deliver software that works
Again, your comment was about "saying things are correct in the semantic sense". You're saying now "I don't care about correctness in the semantic sense, I care about delivering software that "works" (whatever this means; clearly it doesn't mean software without bugs) in a reasonable amount of time" and that's fine. I do too, but again, nothing to do with your original argument.
>Java is not a theorem prover, and your comment was about theorem provers.
No, my comment was about formal methods in general. Every type system is a form of a machine assisted proof in practice. It's just that most type systems don't allow you to prove interesting things about your code.
The main problem is that of basic cost/benefit analysis. If it takes significantly more effort to write a full formal proof, and you end up sometimes catching minor errors, the effort is not justified in most scenarios.
>Again, completely orthogonal to your argument. If you have an error in specification tests won't help you because you'll test the wrong thing.
This is completely central to my argument. I'm saying that encoding a meaningful specification using a type system is a lot more difficult than doing that using runtime contracts, or even simply reasoning about the code unassisted.
I can read through the 5 lines of Python code implementing insertion code, and be reasonably sure that it's correct. I would have a much harder time verifying that 300 lines of Idris are specifying what was intended.
> You're saying now "I don't care about correctness in the semantic sense, I care about delivering software that "works" (whatever this means; clearly it doesn't mean software without bugs) in a reasonable amount of time" and that's fine. I do too, but again, nothing to do with your original argument.
I'm saying that you have diminishing returns. What you want to show is semantic correctness, and type systems are a poor tool for doing that. So, while you can use a type system to do that in principle, the effort is not justified vast majority of the time.
> I can read through the 5 lines of Python code implementing insertion code, and be reasonably sure that it's correct.
Binary search is possibly one of the simplest and most basic CS algorithms, and yet, it took people who were "reasonably sure" 10 years to find that bug.
> I would have a much harder time verifying that 300 lines of Idris are specifying what was intended.
Then don't use Idris?
> I'm saying that you have diminishing returns.
I agree. However, that was not your argument. Your argument was that tests subsume proofs, and that's obviously wrong.
Also, types vs tests is a false dichotomy. Personally, I find a strongly typed language + QuickCheck to be the most practical way of developing complex software. YMMV, and that's fine.
> The thing to realize is that theorem proving is not a business goal in most cases. What you care about is being able to deliver software that works well in a reasonable amount of time. Sure, you might end up with a binary search bug in your code that goes unnoticed for 10 years, but clearly the world didn't stop because of that, and people have successfully delivered many projects in Java despite that bug that work perfectly fine.
I agree with you on that part. However, if the whole stack of computer technologies were more reliable, we would maybe think about new business usecases that we subconsciously dissmised because of lack of trust in current technology.
It's basic cost/benefit analysis. At some point you end up with diminishing returns on your effort that are simply not worth the investment.
Also worth noting that nobody has been able to show that static typing leads to more reliable code in practice https://danluu.com/empirical-pl/
Claiming that to be the case putting the cart before the horse. A scientific way to approach this would be to start by studying real world open source projects written in different languages. If we see empirical evidence that projects written in certain types of languages consistently perform better in a particular area, such as reduction in defects, we can then make a hypothesis as to why that is.
For example, if there was statistical evidence to indicate that using Haskell reduces defects, a hypothesis could be made that the the Haskell type system plays a role here. That hypothesis could then be further tested, and that would tell us whether it's correct or not.
This is pretty much the opposite of what happens in discussions about static typing however. People state that static typing has benefits and then try to fit the evidence to fit that claim.
I agree with you, there is no much data about the effectiveness of more rigorous software development tools. It's clearly a research topic.
My intuition is that, with advances in robotics and AI, we may see the need of more robust logical systems. At some point, mathematicians and algorithmicians may use those tools to prove new concepts, which they will be able to share and valid more quickly, which then will percolate into software engineering more quickly.
Beyond software engineering, the expirements with theorem provers may lead to new ways of exchanging information, such as news, mathematics and legal documents. There are inspirations to take from the formalizations created in theorem provers for applying automated reasoning in more domains than just programming (imho)
Edit : Intuitively, it has do to with building trust in the algorithms and informations we share at light speed. With more validated building blocks, we may explore more complex systems. Accelerating trust between different entities can only lead to plus value, I guess. That's all very abstract tho
Edit 2 : too put it in even fewer words, theorem provers may be more about collaboration than just pure technical engineering
> Reading and understanding the Idris proof is actually more work than understanding the untyped Python version, therefore it's actually harder to say whether it's correct or not in a semantic sense.
But you don't have to write a proof to sort a list in Idris. So you're not being honest with your comparison.
Sure, but then you're not getting the formal guarantees. The whole argument here is regarding whether adding more formalism improves code quality. If you agree that less formalism is better in some cases, then it's just a matter of degrees.
Might it not actually be easier to prove that the Python code sorts than that the Idris code proves sorting?