Hacker News new | ask | show | jobs
by todd8 1035 days ago
I was working on Distributed Services for AIX in 1986 and 1987, a distributed filesystem to compete with NFS. As this was being developed by a dev team, my colleague and I pondered how to test the system that we had architected.

There were so many possible states that a system's file system can be in. Were the conventional tests going to catch subtle bugs? Here's an example of an unusual but not unheard of issue: in a Unix system a file can be unlinked and hence deleted from all directories while remaining open by one or more processes. Such a file could still be written to and read by multiple processes at least until it was finally closed by all processes having open file descriptors at which point it would disappear from the system. Does the distributed file system model this correctly? Many other strange combinations of system calls might be applied to the file system. Will the tests exercise these.

It occurred to me that the "correct" behavior for any sequence of system calls could be defined by just running the sequence on a local file system and comparing the results with running an identical sequence of system calls against the distributed file system.

I built a system to generate random sequences of file related system calls that would run these on a local file system and a remote distributed file system that I wanted to test. As soon as a difference in outcome resulted the test would halt and save a log of the sequence of operations.

My experience with this test rig was interesting. At first discrepancies happened right away. As each bug was fixed by the dev team, we would start the stochastic testing again, and then a new bug would be found. Over time the test would run for a few minutes before failure and then a few minute longer and finally for hours and hours. It was a really ingesting and effective way to find some of the more subtle bugs in the code. I don't recall if I published this information internally or not.

3 comments

That's very much my experience: asymptotic reduction in bug incidence and matching increase in runtime between subsequent errors. To the point that they are so rare that you think you have fixed all of them. But that's usually an illusion: it's just that the error rate is now so low that you no longer observe incidents yourself. The only way to get past that hurdle is to do the bad thing: release and hope that you got it right and if you didn't that the incidents will not be too bad.

You could run many tests in parallel to reduce the chance but it will never be completely zero. Writing bug free software this way is hard. The better way is to design it from the ground up with a bunch of instrumentation that keeps all of your invariants under close observation and that stops the moment anything is not according to your assumptions. This usually gets you to a high level of confidence that things really do work as designed. But of course, that also isn't perfect and residual risk (and residual bugs...) will always remain in any system of even moderate complexity. File systems are well above that level, especially distributed file systems.

I think there's room for a hybrid approach, not just fixing the bug a fuzz test found, but treating the fuzz test error as a bug in invariants (either inadequate or violated in a subtle way that's not caught) and addressing that underlying issue.
Yes, good point: often it is the assumptions themselves that were broken. So effectively you are debugging both the real system and the monitoring system.
> The better way is to design it from the ground up with a bunch of instrumentation that keeps all of your invariants under close observation and that stops the moment anything is not according to your assumptions.

Any personal "war stories" you are willing to share explaining how you went about designing such a system? :-) Or any presentation of someone who did that?

I wrote about retrofitting such a system onto an existing one:

https://jacquesmattheij.com/all-programming-is-bookkeeping/

Which was written in the context of:

https://jacquesmattheij.com/saving-a-project-and-a-company/

And without which that project likely would not have seen a successful end result.

It really is a war story, and what I like most about it is how at the drop of a hat a team assembled to get the job done, cleanly tackled the problem(s) and transferred duties to a new crew.

The key to modern fuzzing is feedback, usually some kind of coverage measurement of the program under test. This allows the fuzzer to be much smarter about how it finds new code paths and discards inputs that don't extend coverage. This makes fuzzing find bugs a lot quicker.

Google have a project to do fuzzing on Linux system calls using coverage feedback: https://github.com/google/syzkaller

I used this strategy for implementing a regex engine. I wanted to completely imitate the Lua pattern implementation, so I generated random patterns, ran them on random strings and compared the results.

It was very pleasant to work with such a system. Nowadays I would probably fuzz the patterns with AFL somehow.