Hacker News new | ask | show | jobs
Ask HN: What is the emerging state of the art in fuzzing techniques?
125 points by hwhatwhatwhat 3625 days ago
I'm fairly familiar with the popular tools such as afl and Codenomicon Defensics. But I find the academic literature very opaque and don't really know where to start.

If I want to understand the cutting edge of fuzzing techniques, and what will be the emerging state of the art in the next few years - where should I look? Any good papers or books (with at least some for novices to understand), or research projects that are leading towards a new excellence?

19 comments

in general, have a poke around https://fuzzing.info/papers/

First, I think the next big step in fuzzing will actually be a complement to fuzzing - solving.

AFL and friends can bitbang their way to massive code coverage, but can still fail on fairly simple testcases. Some recent research[1] by the authors of Angr[2] show that by pairing the brute-force coverage and exception discovery of a tool like AFL with constraint solving tools can really dig deep into a program, by actually solving the path to a given block of code. Microsoft's infamous SAGE fuzzer does this IIRC.

Second, I think there are still massive oportunities for fuzzing closed-source programs, as well as programs with tricky state, such as browsers or network daemons.

[1] https://www.internetsociety.org/sites/default/files/blogs-me...

[2] http://angr.io

Has anyone tried combining fuzzing with 'all-pairs'/'pairwise' test case generation?

The idea of pairwise testing is that individual features in a program are commonly tested, but combinations of them are often poorly tested. However, trying to test all features with each other soon becomes a combinatorial nightmare. To deal with this, you use an algorithm (e.g. see the code for 'allpairs' at http://www.satisfice.com/tools.shtml ) to pick a minimal set of test cases that cover all possible pairs of configuration settings.

These test cases could then be used as starting points for fuzzing, to provide a greater code coverage faster.

NIST has done some research on combinatorial software testing and has a pretty good package [1] I've used before for generating test cases given N dimensions.

The idea behind their software is to maximize the effectiveness of test time because testing those N dimensions exhaustively is infeasible.

[1] http://csrc.nist.gov/groups/SNS/acts/index.html

I'm the second author on [1], if anyone has any questions.
Thank you. Things like this are what make HN awesome :)
Why "infamous"?
I have no idea why I wrote that, bad wording.
I recommend https://www.usenix.org/search/site/fuzzing.

ACM library is fine as well, although I find more security-related journals in USENIX database.

I am very interested in learning about this too! I started out with AFL for fuzzing but soon had to move to LLVM's LibFuzzer because I didn't want non-ASCII inputs (by design, we know we wouldn't get that) and also SantizerCoverage seemed to be more robust than the 64kB shared memory array that AFL uses for large programs.

However, libFuzzer being an in-process fuzzer has again created a lot of headache - especially in places where we malloc stuff and expect free to implicitly happen at exit - in libFuzzer's case, the exit is caught and the entrypoint function is restarted causing memory leaks and OOM crashes. This made me have to include #ifdef FUZZ ... #endif lines in the codebase - adding different behavior in fuzzed and unfuzzed cases which felt wrong.

I have considered implementing an out-of-process fuzzer from scratch (or base it off AFL), but have been holding off till I get time to read about more prior work given that this is not of the highest priority at work.

That said, SAGE from Microsoft seems really interesting[1]. It generates inputs intelligently by constraint-solving on inputs to conditional statements. It isn't exactly new though.

[1] http://research.microsoft.com/en-us/um/people/pg/public_psfi...

Bear in mind that while AFL may feed you a binary string, you can "decode" that into whatever you want. It may take a bit of creativity or bit-shifting, but it can be done. You're free to grab 8 bits off the front and start setting global run-time flags or something, for instance. It's just data.

Also due to the way that AFL works, if you have a branch at the beginning of your program that immediately exits if it sees non-ASCII input, you don't lose all that much time, because AFL sees that as the same branch being exercised over and over again and hammers only on the input that allowed it to progress past that. In fact I think I almost always use AFL on text-based protocols, and it works fine. It's a common use case for AFL.

While I didn't internally decode the bitstring, you are right. AFL did generate tonnes of useful tests and I did uncover a couple of bugs using it.

That said, given that I didn't spend too much time actually trying to understand AFL, could you clarify if I was right in my understanding that AFL doesn't have true coverage but rather a heuristic using a table as documented in http://lcamtuf.coredump.cx/afl/technical_details.txt . Given that the program I was fuzzing was HUGE, wouldn't it falsely alias branches?

Thank you for your input!

It could probably go either way, depending on the nature of your program. It depends on the correlation between "separate paths of execution" and "separate test cases". Certainly that's strong for a lot of programs, but if I sat down to construct pathological cases I probably could, and in the world of Turing chaos that programs inhabit, if pathological cases can exist, you can count on hitting them sometimes, no matter what low probability of that outcome you think you can justify.

AFL is definitely heuristic, and thus can conceivably be fooled in places where a true symbolic execution wouldn't be. On the other hand, it's very fast and easy to set up and use. Can't ever have it all. :)

I've started using clojure.spec[0] in my regular day programming with an eye on using generative testing, which I understand is a form of fuzzing. I'm very new to this, but it feels incredibly practical in terms of bang for buck - like the 'cutting edge' of practical use. I'm not sure what it's academic background is, but I'd highly recommend reading and listening to what Rich Hickey has to say about it. He's a smart guy.

[0]http://clojure.org/about/spec [1]http://blog.cognitect.com/cognicast/103

I'll definitely second this one, I'm still getting comfortable with Clojure Spec and the hooks into test.check (it's brand new, and I'm unfortunately no longer doing Clojure for my day job) but it's already great and I'm only at the beginner/intermediate stage.
For people as confused about this post as I was:

https://en.wikipedia.org/wiki/Fuzz_testing

There is a 30 seconds video based on 90 minutes of a collaborative experiment by 22 fuzzing practitioners. Aim was to write down couple of points about fuzzing state of the art.

Video: https://www.youtube.com/watch?v=UrhRUKgeDQI

Text: https://github.com/ouspg/ouspg-open/blob/master/presentation...

Updates e.g. as pull requests most welcome. :)

DARPA's Cyber Grand Challenge is pushing the state of the art when it comes to RE, exploitation, and the like. Very presentable talk by Mike Walker, who's heading up the project with a bunch of ex-Raytheon guys.

There are links to some repos of in the talk: it's not exactly what you're looking for, but if you're interested, it's a good resource.

https://www.youtube.com/watch?v=ejPghbtAG58

Here you have some interesting work from Fabien Duchene, ENSIMAG/CEA researcher, about black-box genetic fuzzing (I know it sounds like a lot of buzzwords, and in fact it was a little bit mocked during SSTIC 2016, but it's some really good stuff !)

http://hal.univ-grenoble-alpes.fr/hal-00978844/ https://dl.acm.org.sci-hub.cc/citation.cfm?id=2557550&dl=ACM...

I don't know if adding a sci-hub link is a good idea.
"This put Dan in a dilemma. He had to help her—but if he lent her his computer, she might read his books. Aside from the fact that you could go to prison for many years for letting someone else read your books, the very idea shocked him at first. Like everyone, he had been taught since elementary school that sharing books was nasty and wrong—something that only pirates would do."
Why not?
This has been discussed to no end in the past. When I published my paper, I did so in a venue that had open-access to all its proceedings (HotPower 12). However CODASPY where this paper comes from isn't one of those venues. ACM explicitly denies permission to host the papers elsewhere - From the paper:

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

While we might question the ethics or morality of closed-access conference proceedings, it is definitely on the wrong side of legality. Much like patents are evil but can be defended/enforced by big corporations legally.

IMHO, professors and students who take public grants should publish only to open-access conferences and journals. If they do not do so, it does not justify illegally downloading the paper.

Would you accompany a movie recommendation with a piratebay link on HN?
Except this research was (most probably, I did not check) funded with french or european taxpayers' money, and therefore people should not have to pay for it twice.

The moral standpoint is very different than with movie or music pirating.

We should try to extend the First Sale Doctrine to digital publications. If it hasnt been tried already.
It's standard practice in many, many online fora to accompany song recommendations with YouTube links.
This is not exactly what you asked for, but if the code you want to fuzz is written by yourself, you could learn fuzzing by doing the fuzzing yourself. It might also be easier to do this at first, since you're closer to your code, and would need less adapters for existing fuzz solutions.

1. Write a simple test function which will generate a very wide range of allowed inputs to the function you want to test. Try to generate average inputs most of the time, and outliers some of the time. Use a seeded Mersenne Twister as your random number generator. For example, if the function you are testing accepts an array of buffers, then for a single test of the function, you could choose at random how many buffers to generate, and then at random the length of each buffer, and then at random the contents of each buffer. You could then call the function many times, each time with a different array of inputs. Or if you were testing a document editor or CRDT, you might want to randomly generate different combinations of user edits, e.g. a delete 10% of the time, an insert 50% of the time, etc.

2. Write the simplest possible independent implementation of the function you want to test. For example, if you are testing a custom hash map, you could use the hash map from your standard library as the basis for the independent implementation. Or if you were testing a key/value storage engine, you could consider using an in-memory hash as the basis for the independent implementation.

3. Run your random fuzz inputs from step 1 through both your implementations and assert that the outputs of both are always the same at each step. Both implementations could be called a few thousand times depending on the run time.

Checkout TriforceAFL as well, which allows using AFL (American Fuzzy Lop) inside a qemu virtual machine.

This is a pretty interesting writeup on it: https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...

shameless self plug: https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...

allows you to run AFL on arbitrary VMs. so far we've used it to find some Linux vulnerabilities, and are starting to find stuff in other operating systems too. and we're just getting started :)

I read this recently via HN. Congrats, this is impressive.
You could take a look at the proceedings of conferences like ICSE (International Conference on Software Engineering). For example there's been a lot of work over the last decade or so on making dynamic symbolic execution techniques more practical (as exemplified by e.g. KLEE or SAGE).
My friend is working on some next gen fuzzing stuff here:

https://github.com/2trill2spill/nextgen

Thanks for the shout out, But I'm the nextgen author If anyone has any questions, comments, bugs, whatever. The project was started because of frustrations with AFL and trinity. The "novel" feature at this point is using dtrace for instrumentation instead of a compiler plugin like AFL. This allows for instrumenting closed sourced applications without using qemu and non C/C++ applications. I have more planned for the future but I'm trying finish the core of nextgen, mainly the logging, interprocess communication, and concurrency model.
I worked on the cyber reasoning system (CRS) at Trail of Bits for our entry into the Cyber Grand Challenge [1]. Some slides describing the system are here [2].

Specifically, I implemented our fuzzer. I created a dynamic binary translator [3] that emulated the DECREE [4] operating system and x86 arhcitecture. It had the Radamsa [5] mutator built-in, along with a number of other simpler mutators.

I think our fuzzer out-performed our competitors, though I am biased ;-) The fuzzer was single-threaded, but could perform more than a million fuzz/mutate-execute (with coverage) iterations every two hours. Before I optimized it, it beat the pants off PIN [6]. We ran many such fuzzer processes concurrently. They would saturate the CPUs, and actually performed no I/O because I emulated all I/O in memory ;-) This was key to us achieving such high-throughput.

Our fuzzer wasn't super smart (though Radamsa is), but it benefited a lot from a feedback loop with our symbolic executors [7]. The symbolic executors would produce inputs that would then get fuzzed. These inputs could feed back into the symbolic executors, etc. That added more brains to our system.

All in all, we ran the CRS across something like 300 large EC2 nodes (across three availability zones). Per node, 8 or so fuzzers processes were running constantly for 24 hours. I'd ballpark that as 28.8 billion mutate+execute cycles.

In conclusion, the key for us was to make a production-quality, high-throughput fuzzer that did only one thing really well and really fast, then complement it with other more sophisticated tools like symbolic executors.

[1] https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-... [2] http://infiltratecon.com/archives/Slides_Artem_Dinaburg.pdf [3] https://en.wikipedia.org/wiki/Binary_translation [4] https://github.com/CyberGrandChallenge/libcgc [5] https://github.com/aoh/radamsa [6] https://software.intel.com/en-us/articles/pin-a-dynamic-bina... [7] https://en.wikipedia.org/wiki/Symbolic_execution

It's impressive how many resources you threw at problem! We (Shellphish) had very similar results by using AFL [4] for fuzzing and angr [5] for symbolic execution (we published our approach at NDSS in February [1]) on around 300 cores. Of course, we procrastinated pretty heavily, ended up hacking our CRS together in three weeks, and it was absurdly rough around the edges and didn't get anywhere near your crash numbers during the qualifying event itself. As we discuss in the paper, our experiments with the impressive numbers were carried out afterwards, in less chaotic conditions.

It's interesting that the approaches taken by us [1], you [2], and ForAllSecure [3] for the CQE (at least on the exploitation side) were so similar. I've talked with two other teams that had an analogous setup (as well as two other teams, who did quite well, that took a very different route). I guess some great minds think alike!

As a side note, in the ToB blog post, you talk about wanting to join up with another team to be able to play in the final event. Did you guys end up finding a partner? It'd be interesting to face your CRS again next month :-)

[1] https://www.internetsociety.org/sites/default/files/blogs-me... [2] https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-... [3] https://blog.forallsecure.com/2016/02/09/unleashing-mayhem/ [4] http://lcamtuf.coredump.cx/afl/ [5] http://angr.io

We ran three versions of our system, because in the last few weeks/days before the quals we had a bunch of regressions. One of those regressions was caused by a single line change in my fuzzer :-p On the twitter account we used to track our progress, we used rapper names for each version. One version was two weeks old, one was up-to-date with the fixed fuzzer, and one had an experimental version of one of our symbolic executors that produced <var> tags. That didn't work.

We also under-utilized those nodes :-( Each node had at at least 4 idle cores wasting our money. Our resource allocation mechanism was naive.

I looked through some of the stuff released byt DARPA after the event and they released some of our PoVs as official PoVs. If you hex-decode them, you'll see something like "bad seed to Radamsa"!! That was a bug in how I would invoke Radamsa -- sometimes I'd pass it a seed that was way too big.

We tried to team up with every team but ForAllSecure. No one wanted to have our name on their ticket, or they were just fishing for details :-/ We've done a bit of work on the system since, getting it to work on Linux programs via a "port" of parts of libc to DECREE.

You didn't approach us, either; bummer! I doubt we'd have been down to team up outright, but maybe some of you guys could have done an internship in our lab if you really wanted to work on the CGC? I guess it's ancient history now, though.

Resource utilization is definitely tricky. But man, the amount of resources you had is just mind-boggling! I just realized it's even more cores than DARPA gave us for the final event! I'm impressed you guys managed to keep it all running smoothly (at least, it seemed that way from our lab, where there was complete chaos as our system fell over and crapped itself repeatedly for the first few hours).

Hey Zardus, I'm sure the info reached you. What team are you? We approached everyone either indirectly through broadcast emails or directly through me calling/emailing.

I'm not sure an internship would have been what we were looking for. If I had to guess, that's probably why we didn't pursue it further.

Interestingly, Radamsa is implemented in Scheme, then transpiled to C. Originally I played around with invoke Radamsa as a server (it's normal usage model), but this wasn't ideal because I wanted to use it at varying granularities, which would have meant multiple invocations per mutate/execute cycle. What I ended up doing was to take the compiled Scheme, get rid of all the syscalls, link it directly into the fuzzer program binary, and turn its `main` function into something I could call directly!
Badass tech and fuzzing results in your main comment. :) What Scheme or transpiler did you use, though?
You might not be as far from the cutting edge as you'd expect.

From what I've seen, fuzzing is divided into two major camps (I'm generalizing to the extreme here):

1. Mutational - These include tools like AFL, are gaining traction in the open source community, and have a lot of applications, perhaps most notably in library and application fuzzing.

2. Generational - These include commercial tools Defensics and PeachFuzzer, and open source tools like Peach, Spike, and Sulley. The state of the art is held by commercial offerings in this camp, and it's what businesses are more likely to be interested in.

My hypothesis as to the reason for this split: Open source hackers are interested in finding bugs. Businesses are interested in assurance that their software is safe ("safe"). Protocol-specific tools give the impression that we've done the best we can at securing IP/TCP/TLS/HTTP/etc. Defensics is by far the dominant offering (in terms of apparent popularity), and Peach is the only active competitor I've ever found.

The open source generational branch is moving very slowly. The primetime candidate was once Peach, now called Peach Community [1]. Unfortunately the corporate backer switched to a closed solution, and left the open source tool out to dry. The latest tool of note besides Peach is Sulley [2] [3].

Books: I haven't found any books that go below the surface. "Fuzzing: Brute Force Vulnerability Discovery" has decent reviews on Amazon, but I found it more breadth than depth.

Papers:

1. IMO the seminal paper on fuzzing is Rauli Kaksonen's thesis, "A Functional Method for Assessing Protocol Implementation Security." [6] This will take you almost to the state of the practice. Kaksonen was a co-founder of Codenomicon. Very interesting read.

Talks: If you want cutting edge research, conference talks and blog posts may be as good as papers.

1. 2007 Blackhat conference Sulley talk "Fuzzing Sucks! - Introducing Sulley Fuzzing Framework" [2]

2. Google Charlie Miller fuzzing. My favorite slide decks are [7] and [8]. High fives (and a beverage on me should time and space ever permit) to anyone who can find audio or video from the actual talks.

Shameless plug(s):

1. Due to lack of response on Sulley pull requests, I forked to a new project called boofuzz [4], and I commit to at least address pull requests more quickly.

2. I'll be giving a fuzzing talk at Defcon 24's Packet Hacking Village which will address, among other things, the state of open source fuzzing [5].

[1]: http://www.peachfuzzer.com/resources/peachcommunity/

[2]: http://www.podcast.tv/video-episodes/pedram-amini-aaron-port...

[3]: https://github.com/OpenRCE/sulley

[4]: https://github.com/jtpereyda/boofuzz

[5]: https://www.wallofsheep.com/pages/dc24

[6]: http://www.vtt.fi/inf/pdf/publications/2001/P448.pdf

[7]: https://cansecwest.com/csw08/csw08-miller.pdf

[8]: http://pages.cs.wisc.edu/~rist/642-fall-2012/toorcon.pdf

I would have a look at Project Zero.

P.S. I think many governments and corporations would keep their fuzzing techniques quite secret. You don't want to do the same fuzzing as anyone else.

Really?

I thought everyone dropped "security by obscurity" long time ago.

I was actually thinking along the lines of fuzzing for exploit development.

You want a unique bug that will last a long time. In which case, your fuzzing techniques are a trade secret. A lot of fuzzing advances take place behind closed doors.

Project Zero has some people with interesting backgrounds doing bug hunting for good.

I hold the belief that Google must be getting something else out of Project Zero other than just "we hire the best hackers" bragging rights.

I figure they're selling exploits (the ones they don't publicise) to governments.

contrary to popular opinion....

Obscurity is effective as one layer in a layered defense.

"Defense in depth".

By itself, sure: but it can only help if used in tandem with other methods
curious how people who have been doing fuzz testing feel about property-based testing (at least the part about generating inputs, not asserting results)
You should watch for upcoming presentations of Mateusz Jurczyk @j00ru . I'm not entirely sure where he's going to present next though.