Hacker News new | ask | show | jobs
by flyingmutant 4583 days ago
While that undoubtedly sounds very impressive, IMO tests should not be what you use to ensure that your software is correct — in my opinion, they can be actively a bit harmful by giving you a wrong impression of correctness.

Just take a look at https://www.sqlite.org/changes.html — after the recent 3.8.0, there were 2 bug fix releases separated by 3-4 days (!), with bug descriptions sounding relatively, hmm, trivial, should I say?

I use SQLite, and consider it a great software; just want to make a point about my own dislike of over-reliance testing. Hight-level languages and strong (automatically) provable invariants should be our basis for writing reliable software, not C with a ton of tests.

8 comments

1. 3.8.0.2 fixed a logical error in the code. High-level language won't help you if you invert a logical condition or commit some more subtle logical mistake.

2. Our software also needs to be fast. No one needs a slow database engine. There is no high-level language that beats C on tasks like writing sqlite.

3. Finally, show me that unicorn high-level language with strong, automatically provable invariants and one piece of widely used, reliable software that was written in it.

Sqlite works. Testing works.

If there actually was a credible alternative to C for writing software that is efficient and reliable with requirements that sqlite has, I can assure you that people would use it. The problem is: it doesn't exist.

> 3. Finally, show me that unicorn high-level language with strong, automatically provable invariants and one piece of widely used, reliable software that was written in it.

Not implying that this is true now, but this is the specific goal of Rust, with Servo.

With its current design trajectory, Rust won't be able to prove interesting high-level invariants. It's going to give you the kinds of things you can prove with traditional ML-style type systems and affine types. The combination of the two lets you encode some interesting things, e.g. typestates, but they won't let you express something like "this browser engine correctly implements CSS".
Contenders would be ATS and Nimrod. Neither fulfills the "one piece of widely used, reliable software that was written in it" requirement, though.
Airbus planes run software written in Ada. Ada does fairly well in the programming language shootout and is designed for writing reliable software.
Most high-level languages have the drawback that they usually cannot inter-operate with anything else, or if they can, it gets very ugly. This begins with C++ being impossible to consume from anything else than C++ and certainly doesn't stop with Java or similar languages. For a portable, low-level library that aims to exist nearly everywhere I doubt there is anything better than C.

And you'd need tests either way. With certain high-level languages you could reduce the need for valgrind and signed integer overflow tests perhaps, but nearly all other test (and thus error) classes remain. You will want to be safe from regressions, you will want to make sure that you can recover gracefully upon process termination, OOM or power failures. You also want to make sure that your SELECT doesn't yield different result sets than other database engines for the same data set.

All those cases are not going away, just because you use a different language. And proving things correct necessitates that your formal model accurately describes what you want to create. Bugs in said model will still be bugs in the application, even if you proved that the application conforms exactly (which still isn't easy).

Typical ivory tower hn blather honestly. I would love to hear a real solution instead of self aggrandizement and vague ideas.
But is possible to create very serious bugs even with high level languages, and proving software in a way to provide good coverage is extremely hard.
Yeah - it's ludicrous to suggest provable constraints will ever really achieve anything close to the coverage these tests have.

Having said that, if you have the choice, it's still unwise to spend huge amounts of time writing+maintaining tests but then balk at the relatively short amount needed to use a strong type system. It won't catch lots of things, but it will catch quite a few. For the limited concepts it can express, it can ensure correctness more rigorously, easily and quickly than any testing can. Might as well use the easy method where you can, right?

But sqlite is old and extremely portable - the only language that springs to mind would be C++. And while that does provide advantages, it's a bit of a stretch to claim those would merit a rewrite.

If you don't actively maintain the tests, then yes you can get a wrong impression of correctness. But when done properly testing helps ensure a certain level of correctness and old bugs don't reappear. A good rule to follow is to create a test(s) for each bug and run the tests on every check-in which is pretty easy with continuous integration. This will mitigate most bugs before they reach production. And the ones that do reach production get fixed...forever.
This "old bugs reappearing" how does this happen? People make the same logical mistakes in new code? Old buggy code is literally resurrected somehow?
Making changes again at the same place where a bug was fixed previously is a common offender. At least at my workplace we have regressions every now and then (and thanks to no test suite those sometimes make their way to the customer, yay). Basically every time you touch code you have the potential to break something. Even more so when a bugfix was done by adding code, not by changing it. People might accidentally rewrite the fix. If you have no tests that ensure that functionality stays that way it can be easy to break things in the process.
From my days as a release/build manager, the most common cause of this was people merging code from branches where fixes had been overlooked and similar horrors. The second highest would be people fixing something that "looked wrong" but was actually there to solve a problem.
Yeah I should have probably used the technical term: regression. It is very easy to do. Sometimes it happens when code is rewritten/refactored. Sometimes a new feature utilizes a buggy code path that was not previously used/tested. Etc.
It's pretty easy to make the same mistake that originally caused a bug when rewriting code.
There are no automatically provable invariants that would cover SQLite's functionality. How does a tool automatically prove that a query compiler is correct, for example? I don't see an easy way, barring writing SQLite in Coq.
Your choice of language can help you avoid, or even completely eliminate, certain types of defect.

But no language is ever going to prevent you from coding the wrong intent into your application - your programming language has no idea whether (for example) you are intending to keep or omit a left join in your optimisation routine.

So, how would you test database software?

This reminds me of an old Google(?) interview question.

"How do you test a Calculator after dropping it?" http://www.youtube.com/watch?v=ILkT_HV9DVU&t=46m50s

For all the possible things that could go wrong in a fancy graphing calculator, what should you do to ensure it works properly?

For interview purposes, I build a LEGO Mindstorms framework to push the buttons and check that the correct LCD segments light up. And wait several weeks while it runs through a sufficient number of iterations to give some assurance of correctness.

For real-world purposes, I see if it turns on and gives me 4 when I do 2+2. If it fails, I buy a new one off Amazon.