| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gamozolabs 2017 days ago

Oooh, wasn't really expecting this to make it to HN cause it was meant to be more of an announcement than a description.

But yes, I've done about 7 or 8 operating systems for fuzzing in the past and it's a massive performance (and cleanliness) cleanup. This one is going to be like an operating system I wrote 2-3 years ago for my vectorized emulation work.

To answer your QEMU questions, the goal is to effectively build QEMU with MUSL (just to make it static so I don't need a dynamic loader), and modify MUSL to turn all syscalls to `call` instructions. This means a "syscall" is just a call to another area, which will by my Rust Linux emulator. I'll implement the bare minimum syscalls (and enum variants to those syscalls) to get QEMU to work, nothing more. The goal is not to run Linux applications, but run a QEMU+MUSL combination which may be modified lightly if it means a lower emulation burden (eg. getting rid of threading in QEMU [if possible] so we can avoid fork())

The main point of this isn't performance, it's determinism, but that is a side effect. A normal syscall instruction involves a context switch to the kernel, potentially cr3 swaps depending on CPU mitigation configuration, and the same to return back. This can easily be hundreds of cycles. A `call` instruction to something that handles the syscall is on the order of 1-4 cycles.

While for syscalls this isn't a huge deal, it's even more emphasized when it comes to KVM hypercalls. Transitions to a hypervisor are very expensive, and in this case, the kernel, the hypervisor, and QEMU (eg. device emulation) will all be running at the same privilege level and there won't be a weird QEMU -> OS -> KVM -> other guest OS device -> KVM -> OS -> QEMU transition every device interaction.

But then again, it's mainly for determinism. By emulating Linux deterministically (eg. not providing entropy through times or other syscall returns), we can ensure that QEMU has no source of external entropy, and thus, will always do the same thing. Even if it uses a random-seeded hash table, the seed would be derived from syscalls, and thus, will be the same every time. This determinism means the guest always will do the same thing, to the instruction. Interrupts happen on the same instructions, context switches do, etc. This means any bug, regardless of how complex, will reproduce every time.

All of this syscall emulation + determinism I have also done before, in a tool called tkofuzz that I wrote for Microsoft. That used Linux emulation + Bochs, and it was written in userspace. This has proven incredibly successful and it's what most researchers are using at Microsoft now. That being said, Bochs is about 100x slower than native execution, and now that people have gotten a good hold of snapshot fuzzing (there's a steep learning curve), it's time to get a more performant implementation. With QEMU with get this with a JIT, which at least gets us a 2-5x improvement over Bochs while still "emulating", but even more value could be found if we get the KVM emulation working and can use a hypervisior. That being said, I do plan to support a "mode" where guests which do not touch devices (or more specifically, snapshots which are taken after device I/O has occurred) will be able to run without QEMU at all. We're really only using QEMU for device emulation + interrupt control, thus, if you take a snapshot to a function that just parses everything in one thread, without process IPC or device access (it's rare, when you "read" from a disk, you're likely just hitting OS RAM caches, and thus not devices), we can cut out all the "bloat" of QEMU and run in a very very thin hypervisor instead.

In fuzzing it's critical to have ways to quickly map and unmap memory as most fuzz cases last for hundreds of microseconds. This means after a few hundred microseconds, I want to restore all memory back to the state "before I handled user input" and continue again. This is extremely slow in every conventional operating system, and there's really no way around it. It's of course possible to make a driver or use CRIU, but these are still not exactly the solution that is needed here. I'd rather just make an OS that trivially runs in KVM/Hyper-V/Xen, and thus can run in a VM to get the cross-platform support, rather than writing a driver for every OS I plan to use this on.

Stay cute, ~gamozo

1 comments

zomglings 2017 days ago

Your YouTube looks awesome, and I just followed you on Twitch. Looking forward to a marathon stream. :)

A question about fuzzing and determinism:

It seems like this requirement of determinism is connected to (and reinforces) a brute force approach to discovering problematic inputs.

This precludes strategies like performing a truly stochastic search (where you put complete faith in the randomness) over your input space.

Are people attempting this kind of thing? Are there additional requirements to fuzzing that make probabilistically finding a thin trajectory through the input space undesirable?

link

gamozolabs 2017 days ago

Determinism is already fairly important in fuzzing for 2 major reasons.

One, is that having determinism makes it easier to triage bugs. If I find a crash while fuzzing, but there's no way to reproduce it, it's going to pretty much never get fixed (unless the call stack from the first crash is obvious enough to prepare a fix). For fuzzing systems, this is effectively mandatory. Imagine a memory allocation failure leading to a NULL-deref, the odds of hitting that same bug are so low because it requires going OOM at the same point in a subsequent execution. Snapshot fuzzing (each fuzz case starts from a snapshotted state of memory/registers/whatever) mitigates some of this, but there's still noise from context switches that will affect RNGs, which will affect heap layouts, which will affect everything on the system. That being said, most fuzzing out there publicly is not system's fuzzing, and thus this is typically not something people think about.

But two, is what most people use determinism for. In modern fuzzing, coverage guidance is pretty much mandatory. This means when new code is hit, to save off the input such that it can be built upon. At a very simple level, this means a problem which is 256^4, turns into a 256*4, as all requirements do not need to be satisfied simultaneously, as long as the previous requirements cause new code to get hit they can be built upon. Of course, if the program does not behave the same way every time, the noise can start to erode the value which is provided here, since you're not actually building upon the same thing.

I'm not sure if this answers the question.

link

feanaro 2017 days ago

I think you're mixing two different concepts up here.

Determinism in the context of fuzzing is related to the reproducibility of a program state. It's deterministic in the sense that all the inputs to the system are known and controlled. This allows us to repeat all inputs and reproduce the exact same behaviour as before, e.g. an error state we stumbled upon or an interesting program state we want to continue exploring.

This in no way precludes sampling the input space stochastically. Brute forcing by sampling all possible inputs sequentially is usually untenable and wasteful. However, once you do encounter a new program state, you'll be able to perfectly recreate it forever.

link

zomglings 2017 days ago

I think you're right, I was mixing two concepts and my question wasn't really about determinism.

Writing a specialized OS suggests to me that someone is very focused on... the best way I can describe it is cutting a fat trajectory through the input space. I am curious if anyone is spending their effort on sparser (but more intelligent) sampling of the input space instead.

link

feanaro 2017 days ago

Yes, there's a lot of work being done on more intelligent fuzzing. To throw some terms into the mix, there's coverage-guided fuzzing (which is now an old technique), concolic testing (which combines concrete execution with symbolic execution in order to reach new branches in a targetted way) and grammar fuzzers (which generate valid inputs according to a grammar).

These are not really mutually exclusive with the type of work gamozolabs is doing because even with hyperintelligent input generation, you still ideally want raw speed.

link