Hacker News new | ask | show | jobs
by zomglings 2018 days ago
Your YouTube looks awesome, and I just followed you on Twitch. Looking forward to a marathon stream. :)

A question about fuzzing and determinism:

It seems like this requirement of determinism is connected to (and reinforces) a brute force approach to discovering problematic inputs.

This precludes strategies like performing a truly stochastic search (where you put complete faith in the randomness) over your input space.

Are people attempting this kind of thing? Are there additional requirements to fuzzing that make probabilistically finding a thin trajectory through the input space undesirable?

2 comments

Determinism is already fairly important in fuzzing for 2 major reasons.

One, is that having determinism makes it easier to triage bugs. If I find a crash while fuzzing, but there's no way to reproduce it, it's going to pretty much never get fixed (unless the call stack from the first crash is obvious enough to prepare a fix). For fuzzing systems, this is effectively mandatory. Imagine a memory allocation failure leading to a NULL-deref, the odds of hitting that same bug are so low because it requires going OOM at the same point in a subsequent execution. Snapshot fuzzing (each fuzz case starts from a snapshotted state of memory/registers/whatever) mitigates some of this, but there's still noise from context switches that will affect RNGs, which will affect heap layouts, which will affect everything on the system. That being said, most fuzzing out there publicly is not system's fuzzing, and thus this is typically not something people think about.

But two, is what most people use determinism for. In modern fuzzing, coverage guidance is pretty much mandatory. This means when new code is hit, to save off the input such that it can be built upon. At a very simple level, this means a problem which is 256^4, turns into a 256*4, as all requirements do not need to be satisfied simultaneously, as long as the previous requirements cause new code to get hit they can be built upon. Of course, if the program does not behave the same way every time, the noise can start to erode the value which is provided here, since you're not actually building upon the same thing.

I'm not sure if this answers the question.

I think you're mixing two different concepts up here.

Determinism in the context of fuzzing is related to the reproducibility of a program state. It's deterministic in the sense that all the inputs to the system are known and controlled. This allows us to repeat all inputs and reproduce the exact same behaviour as before, e.g. an error state we stumbled upon or an interesting program state we want to continue exploring.

This in no way precludes sampling the input space stochastically. Brute forcing by sampling all possible inputs sequentially is usually untenable and wasteful. However, once you do encounter a new program state, you'll be able to perfectly recreate it forever.

I think you're right, I was mixing two concepts and my question wasn't really about determinism.

Writing a specialized OS suggests to me that someone is very focused on... the best way I can describe it is cutting a fat trajectory through the input space. I am curious if anyone is spending their effort on sparser (but more intelligent) sampling of the input space instead.

Yes, there's a lot of work being done on more intelligent fuzzing. To throw some terms into the mix, there's coverage-guided fuzzing (which is now an old technique), concolic testing (which combines concrete execution with symbolic execution in order to reach new branches in a targetted way) and grammar fuzzers (which generate valid inputs according to a grammar).

These are not really mutually exclusive with the type of work gamozolabs is doing because even with hyperintelligent input generation, you still ideally want raw speed.