Hacker News new | ask | show | jobs
by yosefk 1038 days ago
The relative rarity of input (pseudo-)randomization in SW testing is near inexplicable to me, except by the very low cost of all but the most commonly reproducing bugs paid by the SW vendor.
3 comments

In the regular testsuite (think CI) you want to have predictable results. Doing them again and again on the same code should give the same results so you can properly see with which code change things got wrong. Maybe it's simpler to explain it the other way around, for every new path your fuzzer(or other randomized test) tests, it also doesn't test a path it tested in a previous run and you probably want to add the failing paths it found to your regular test suite.

Don't get me wrong, we should have more randomization, but it's not good everywhere, which might explain why we don't have as much of it.

it's rather easy to have both randomness and reproducibility, though:

generate a random seed, log it, then create an RNG using that "random, but recorded" seed. make sure all randomness used in the test flows from that explicitly-seeded RNG.

then, have an escape hatch where if a seed is provided as an environment variable, it will use that instead of generating one.

if you have a failure occur, you can always re-run with the same seed as a way to reproduce the failure (assuming it was indeed caused by that random seed and not some other factor)

depending on how fast the tests are, it may also be possible to run them multiple times with different seeds. for example, your on-every-commit CI run might run once with a hardcoded seed of 42. or it might run once with a hardcoded seed and once with a random seed.

and meanwhile, you might have a nightly test run that runs that same test suite 100 or 1000 times, with a different random seed each time.

Any half decent fuzzing setup will log what it did prior so you can replay it to the point of failure. This gets a lot harder when you do multiple such runs in parallel.
AFL++ logs the specific input that causes the crash. In theory at least replaying the input ought to trigger the crash reproducibly. (Sometimes not the case if the program has lots of threads or is event driven or otherwise stochastic).
That all true but at some point the combinations of paths explode. It is not possible to write tests for all the combinations then it possible to cover them eventually with some probability. Fuzzing covers more execution path combinations over time.
hypothesis kind of solves this problem by adding each (minimized) failing input to a file and always running it thereafter

this is a little tricky to integrate into ci

I love fuzzing as a technique and use it quite regularly and I'm even the maintainer of AFL++ in Fedora. But running AFL++ on even a single program occupies all threads of a high end AMD server for weeks. I'm running it locally so merely paying for the electricity. If it was a cloud instance it would cost a small fortune. I think this is a reason it is not used more widely. In addition most CI systems assume the tests will run in a small finite amount of time, not run for weeks on end.

I will note that Google have a programme for doing fuzz testing on open source projects using compute from their cloud: https://google.github.io/oss-fuzz/

hardware people keep saying that for some reason

maybe someday software people will listen

that would be a good day