Hacker News new | ask | show | jobs
by scriptdevil 3632 days ago
I am very interested in learning about this too! I started out with AFL for fuzzing but soon had to move to LLVM's LibFuzzer because I didn't want non-ASCII inputs (by design, we know we wouldn't get that) and also SantizerCoverage seemed to be more robust than the 64kB shared memory array that AFL uses for large programs.

However, libFuzzer being an in-process fuzzer has again created a lot of headache - especially in places where we malloc stuff and expect free to implicitly happen at exit - in libFuzzer's case, the exit is caught and the entrypoint function is restarted causing memory leaks and OOM crashes. This made me have to include #ifdef FUZZ ... #endif lines in the codebase - adding different behavior in fuzzed and unfuzzed cases which felt wrong.

I have considered implementing an out-of-process fuzzer from scratch (or base it off AFL), but have been holding off till I get time to read about more prior work given that this is not of the highest priority at work.

That said, SAGE from Microsoft seems really interesting[1]. It generates inputs intelligently by constraint-solving on inputs to conditional statements. It isn't exactly new though.

[1] http://research.microsoft.com/en-us/um/people/pg/public_psfi...

1 comments

Bear in mind that while AFL may feed you a binary string, you can "decode" that into whatever you want. It may take a bit of creativity or bit-shifting, but it can be done. You're free to grab 8 bits off the front and start setting global run-time flags or something, for instance. It's just data.

Also due to the way that AFL works, if you have a branch at the beginning of your program that immediately exits if it sees non-ASCII input, you don't lose all that much time, because AFL sees that as the same branch being exercised over and over again and hammers only on the input that allowed it to progress past that. In fact I think I almost always use AFL on text-based protocols, and it works fine. It's a common use case for AFL.

While I didn't internally decode the bitstring, you are right. AFL did generate tonnes of useful tests and I did uncover a couple of bugs using it.

That said, given that I didn't spend too much time actually trying to understand AFL, could you clarify if I was right in my understanding that AFL doesn't have true coverage but rather a heuristic using a table as documented in http://lcamtuf.coredump.cx/afl/technical_details.txt . Given that the program I was fuzzing was HUGE, wouldn't it falsely alias branches?

Thank you for your input!

It could probably go either way, depending on the nature of your program. It depends on the correlation between "separate paths of execution" and "separate test cases". Certainly that's strong for a lot of programs, but if I sat down to construct pathological cases I probably could, and in the world of Turing chaos that programs inhabit, if pathological cases can exist, you can count on hitting them sometimes, no matter what low probability of that outcome you think you can justify.

AFL is definitely heuristic, and thus can conceivably be fooled in places where a true symbolic execution wouldn't be. On the other hand, it's very fast and easy to set up and use. Can't ever have it all. :)