Hacker News new | ask | show | jobs
by mqus 1035 days ago
In the regular testsuite (think CI) you want to have predictable results. Doing them again and again on the same code should give the same results so you can properly see with which code change things got wrong. Maybe it's simpler to explain it the other way around, for every new path your fuzzer(or other randomized test) tests, it also doesn't test a path it tested in a previous run and you probably want to add the failing paths it found to your regular test suite.

Don't get me wrong, we should have more randomization, but it's not good everywhere, which might explain why we don't have as much of it.

4 comments

it's rather easy to have both randomness and reproducibility, though:

generate a random seed, log it, then create an RNG using that "random, but recorded" seed. make sure all randomness used in the test flows from that explicitly-seeded RNG.

then, have an escape hatch where if a seed is provided as an environment variable, it will use that instead of generating one.

if you have a failure occur, you can always re-run with the same seed as a way to reproduce the failure (assuming it was indeed caused by that random seed and not some other factor)

depending on how fast the tests are, it may also be possible to run them multiple times with different seeds. for example, your on-every-commit CI run might run once with a hardcoded seed of 42. or it might run once with a hardcoded seed and once with a random seed.

and meanwhile, you might have a nightly test run that runs that same test suite 100 or 1000 times, with a different random seed each time.

Any half decent fuzzing setup will log what it did prior so you can replay it to the point of failure. This gets a lot harder when you do multiple such runs in parallel.
AFL++ logs the specific input that causes the crash. In theory at least replaying the input ought to trigger the crash reproducibly. (Sometimes not the case if the program has lots of threads or is event driven or otherwise stochastic).
That all true but at some point the combinations of paths explode. It is not possible to write tests for all the combinations then it possible to cover them eventually with some probability. Fuzzing covers more execution path combinations over time.
hypothesis kind of solves this problem by adding each (minimized) failing input to a file and always running it thereafter

this is a little tricky to integrate into ci