The goal is to find bugs in code by throwing random data at it, in as an intelligent fashion as possible. You can do that a few ways:
* Give structured data to mutate a bit.
* Just throw random data at it. You could do this with any binary that accepts data either via stdin or from a file.
* Instrument the code, throw random data at it and see what paths of code get triggered and feed that back into the data generator. Drawback is you need to be able to compile all the code involved, so it gets fully instrumented.
AFL/AFL++ sits in the third camp. You compile your code using it, and it then uses information it gets back to figure out ways to trigger code paths, by applying intelligent mutations. It's possible to, e.g. have code that parses a PNG image file, start AFL++ off with no initial data, and it will fairly quickly start producing valid PNG images.
It's a very effective approach for finding bugs. On the AFL++ site there's a small trophy cabinet, and AFL has a larger one (older project) https://lcamtuf.coredump.cx/afl/.
(edit: not to imply you were making fun, your answer was great! But in general: everyone learns things every day. Even every programmer has a day where they learn what fuzzing is)
This term is not used much. Most know more about "random testing" or "monkey testing", much more common out there. I think fuzzing is used a lot to find security holes and I think it is kind of old, very used yet though, but it is not something that is seen everywhere by programmers outside of systems programming. Not all programmers work in the same field, so it is not uncommon for someone not to know about this. In my case, I associate the term fuzzing with matching, for example.
Somewhat. I think it might mostly be that it provides a much greater return for those using languages where incorrectly handled values have a higher chance of causing much worse problems, like C and C++. I think if you write in those languages, or like me you haven't for almost 20 years but you're just still very interested in developments about them because they often seem to illuminate the weird quirks of computing and CPUs, then fuzzing is a much more common thing to have heard about.
Not that fuzzing isn't useful for higher level or managed languages, just that it's extra useful when you throw likely segfaults into the mix.
Fuzzing is ROI efficient (especially for time invested) even if you don't intend to find a segfault, but just want to see how a program works or performs across different input states either in or out of its usual domain (and you can direct the fuzzing many ways derandomizing it or constraining the search space, or using virtualizer like qemu). I like to think of it as "semantics engineering" with spare CPU cycles.
Fuzzing is a technique where you send lot of random or not so random data to the input of a program to see how it reacts, does it crash, does it handle that properly ect ...
For example you want to test your JSON parser, what happens if I send "{", ""\\{" etc ...
Fuzzers can find defects, including vulnerabilities, that might be missed by other tools. AFL used a newer technique, called being "coverage guided", that turned out to be a remarkable improvement. As a coverage guided tool it monitors how many times various code branches are taken, and if the count is different than what has seen before, the input is considered "more interesting". AFL++ inherits this capability.
If you're fuzzing open source software, you might consider applying to OSS-Fuzz https://github.com/google/oss-fuzz which provides a lot of free compute power to run fuzzers (so that vulnerabilities can be found & fixed).
The technique has been used for at least two decades in hardware verification, though the terminology is different. If you search the literature, you'll find terms like "constrained functional verification", "coverage directed test generation", "functional coverage directed test generation", and the like. The technique is the same, random testing, with mutation to try to hit more and more coverage points.
It goes back af least that far in software, with the original fuzzing work from U. Wisc and McKeeman's "Differential Testing for Software". Those are blackbox techniques; AFL's advance was using a general grey box approach.
The hardware approach isn't blackbox, it explicitly uses the reachable state space and constraint solving to reach more coverage points, to do this the exact circuit representation is needed.
The goal is to find bugs in code by throwing random data at it, in as an intelligent fashion as possible. You can do that a few ways:
* Give structured data to mutate a bit.
* Just throw random data at it. You could do this with any binary that accepts data either via stdin or from a file.
* Instrument the code, throw random data at it and see what paths of code get triggered and feed that back into the data generator. Drawback is you need to be able to compile all the code involved, so it gets fully instrumented.
AFL/AFL++ sits in the third camp. You compile your code using it, and it then uses information it gets back to figure out ways to trigger code paths, by applying intelligent mutations. It's possible to, e.g. have code that parses a PNG image file, start AFL++ off with no initial data, and it will fairly quickly start producing valid PNG images.
It's a very effective approach for finding bugs. On the AFL++ site there's a small trophy cabinet, and AFL has a larger one (older project) https://lcamtuf.coredump.cx/afl/.