Hacker News new | ask | show | jobs
by simonhughes22 1918 days ago
It would be great if (in the repo) you could briefly explain what fuzzing in and why you'd need it. I assume it's some sort of obfuscation tool?
3 comments

There's a good intro here: https://www.microsoft.com/en-us/research/blog/a-brief-introd... and afl++'s main documentation is here https://aflplus.plus/ which talks a bit about it.

The goal is to find bugs in code by throwing random data at it, in as an intelligent fashion as possible. You can do that a few ways:

* Give structured data to mutate a bit.

* Just throw random data at it. You could do this with any binary that accepts data either via stdin or from a file.

* Instrument the code, throw random data at it and see what paths of code get triggered and feed that back into the data generator. Drawback is you need to be able to compile all the code involved, so it gets fully instrumented.

AFL/AFL++ sits in the third camp. You compile your code using it, and it then uses information it gets back to figure out ways to trigger code paths, by applying intelligent mutations. It's possible to, e.g. have code that parses a PNG image file, start AFL++ off with no initial data, and it will fairly quickly start producing valid PNG images.

It's a very effective approach for finding bugs. On the AFL++ site there's a small trophy cabinet, and AFL has a larger one (older project) https://lcamtuf.coredump.cx/afl/.

> it's some sort of obfuscation tool?

I didn't expect someone on HN not to know this but then there are not only programmers here I guess ;=)

It's a tool to find bugs. To strongly oversimplify: It throws random inputs at a program until it crashes.

So you could say it's a tool to complement a test suit.

https://xkcd.com/1053/

(edit: not to imply you were making fun, your answer was great! But in general: everyone learns things every day. Even every programmer has a day where they learn what fuzzing is)

Your right my answer was kinda impolite, I apologize.
This term is not used much. Most know more about "random testing" or "monkey testing", much more common out there. I think fuzzing is used a lot to find security holes and I think it is kind of old, very used yet though, but it is not something that is seen everywhere by programmers outside of systems programming. Not all programmers work in the same field, so it is not uncommon for someone not to know about this. In my case, I associate the term fuzzing with matching, for example.
Somewhat. I think it might mostly be that it provides a much greater return for those using languages where incorrectly handled values have a higher chance of causing much worse problems, like C and C++. I think if you write in those languages, or like me you haven't for almost 20 years but you're just still very interested in developments about them because they often seem to illuminate the weird quirks of computing and CPUs, then fuzzing is a much more common thing to have heard about.

Not that fuzzing isn't useful for higher level or managed languages, just that it's extra useful when you throw likely segfaults into the mix.

Fuzzing is ROI efficient (especially for time invested) even if you don't intend to find a segfault, but just want to see how a program works or performs across different input states either in or out of its usual domain (and you can direct the fuzzing many ways derandomizing it or constraining the search space, or using virtualizer like qemu). I like to think of it as "semantics engineering" with spare CPU cycles.

I use fuzzers with a Redex driver usually, which is unusually great at intelligently driving fuzzers: https://docs.racket-lang.org/redex/index.html

Fuzzing is a technique where you send lot of random or not so random data to the input of a program to see how it reacts, does it crash, does it handle that properly ect ...

For example you want to test your JSON parser, what happens if I send "{", ""\\{" etc ...

Fuzzers can find defects, including vulnerabilities, that might be missed by other tools. AFL used a newer technique, called being "coverage guided", that turned out to be a remarkable improvement. As a coverage guided tool it monitors how many times various code branches are taken, and if the count is different than what has seen before, the input is considered "more interesting". AFL++ inherits this capability.

An impressive demo (from AFL) is that it was able to figure out the required format for a JPEG file given only one text file (which is not a JPEG file): https://web.archive.org/web/20201210022938/https://lcamtuf.b...

If you're fuzzing open source software, you might consider applying to OSS-Fuzz https://github.com/google/oss-fuzz which provides a lot of free compute power to run fuzzers (so that vulnerabilities can be found & fixed).

The technique has been used for at least two decades in hardware verification, though the terminology is different. If you search the literature, you'll find terms like "constrained functional verification", "coverage directed test generation", "functional coverage directed test generation", and the like. The technique is the same, random testing, with mutation to try to hit more and more coverage points.
It goes back af least that far in software, with the original fuzzing work from U. Wisc and McKeeman's "Differential Testing for Software". Those are blackbox techniques; AFL's advance was using a general grey box approach.
The hardware approach isn't blackbox, it explicitly uses the reachable state space and constraint solving to reach more coverage points, to do this the exact circuit representation is needed.