Hacker News new | ask | show | jobs
by daniel-levin 1299 days ago
Neat! This is the direction I’d hoped to see gvisor go in. What’s the reasoning for building from scratch and not piggybacking off gvisor?
1 comments

We certainly looked into gVisor and Firecracker when we started this project a few years ago. These systems use KVM and gVisor in particular uses the Model Specific Registers (MSRs) to intercept system calls before forwarding them to the host kernel. Intercepting syscalls this way has less overhead than ptrace and we would have complete control over the system environment. I think it's a good approach and worth exploring more, but ultimately the deal breaker was that KVM requires root privileges to run and it wouldn't run on our already-virtualized dev machines. We also wanted to allow the guest program to interact with the host's file system. So, we went with good ol' ptrace. Last I checked gVisor also has a ptrace backend, but it wasn't very far along at the time. When going the ptrace route, there is less reason to depend on another project. Another reason of course is that we'd be beholden to a Google project. ;)
I thought it was very cool how gVisor is multi-backend (their “sentry” implemented vie either ptrace or kvm), which is pretty unusual with instrumentation tools.

We could maybe have shared this logic to intercept syscalls and redirect them to user space code serving as the kernel. That is, we could have shared the Reverie layer. We saw ourselves as headed towards an in-guest binary instrumentation model (like rr’s syscall buffer). And so one factor is that Rust is a better fit than Go for injecting code into guest processes.

Regarding the actual gVisor user space kernel.. we could have started with that and forked it to start adding determinism features to that kernel. At first glance that would seem to save on implementation work, but “implement futexes deterministically” is a pretty different requirement than “implement futexes”, so it’s not clear how much savings could have been had.

We could still have a go at reusing their kvm setup to implement a Reverie backend. But there’s some impedance matching to do across the FFI there, with the Reverie API relying on Rusts async concurrency mode and Tokio. Hopefully we could cleanly manage separate thread pools for the go threads taking syscalls vs the Tokio thread pool hosting Reverie handler tasks. Or maybe it would be possible to reuse their solution without delivering each syscall to Go code.

> that KVM requires root privileges to run

It doesn't. It only requires privileges to access /dev/kvm

Oops, yes, you are correct and it's not too hard to get around that by adding the user to a group that has access. Still, nested virtualization isn't always enabled, which I think limits the number of places we can run.