Hacker News new | ask | show | jobs
by barkmadley 5286 days ago

    and it makes sense that static analysis tools won't discover anything not already covered by tests
That assertion doesn't make a lot of sense to me. Most static analysers are capable of looking at the edge cases that a human may forget to write a test for, or may not even notice are there.
1 comments

When you have 1000 times as much test code and data as you have code to test, the odds of having missed a subset of the bugspace a static analyzer can find are very, very low I would say. Unless the people who wrote the tests are functionally retarded of course, but let's consider that not to be the case.
I'm not so sure. I'm reading their testing document (and it's a great read), and it sounds like most of those tests are reactive:

Whenever a bug is reported against SQLite, that bug is not considered fixed until new test cases have been added to the TCL test suite which would exhibit the bug in an unpatched version of SQLite. Over the years, this has resulted in thousands and thousands of new tests being added to the TCL test suite. These regression tests ensure that bugs that have been fixed in the past are not reintroduced into future versions of SQLite.

While this is a great practice, it's reactive. It's the result of particular bugs, not someone asking, "What are the situations we haven't covered?"

The coverage they have for error conditions (file system, out of memory, bit-flips) is impressive. I'm not saying I know you're wrong, but I think there are too many variables to say with confidence either way.

The closest you can get to complete test coverage is a policy of writing test coverage for every single new feature coupled with this regression test policy, because once the code base gets to a certain size it's impossible for anyone to form a complete mental model of what is covered or not.

If that's not good enough then I think static analysis is a decent step, but probably pales in comparison to using stricter languages (eg. Haskell).

The amount of test code and test data is proportional to the number of possible codepaths through your code.

This is generally exponential in number of functions, modules, etc involved. For example, a function with N if statements that are not nested generally needs 2^N testcases to properly exercise it.

So having 1000 times more tests than code may not mean that you have complete coverage at all. It depends on the structure of the tests and the code.

I think your analysis is for executing every distinct code path. I read through the SQLite testing page, and they claim 100% branch coverage, which means that they test every possible outcome of a branch - but that's different from what you're going after, which is every possible code path.

(Not disagreeing, I just had to go through this process in my head when I thought about what you said in comparison to what they said.)

Indeed. What they're doing is much better than what most software projects manage, but not quite enough to test correctness unless the code in later branches is completely independent from the code in earlier branches....

For any nontrivial project, testing every codepath is basically impossible, unfortunately. :(