Hacker News new | ask | show | jobs
by prattmic 2586 days ago
> From what I understand, basically a user-space program that wraps your container and intercepts all system calls. You can then allow/deny/re-wire them (based on a config).

gVisor actually intercepts and implements the system calls in the user-space kernel. Two specific goals of gVisor are that (1) system calls are never simply allowed and passed through to the host kernel, and (2) you don't need to write a policy configuration for your application; just put your application inside gVisor and go. These are significant differences over simply using something like seccomp on its own (what the architecture guide calls "Rule-based execution").

Some of this is covered in our security model: https://gvisor.dev/docs/architecture_guide/security/#princip...

2 comments

Reimplementing system calls is non-trivial, especially ones that have complex interactions with others (for example, the system calls related to process management). How do you prevent errors when translating this, and how do you implement features that ostensibly require calls to the OS anyways?
For sure, implementing Linux is no easy task, and there is no magic bullet. For compatibility testing, we have extensive system call unit tests [1] and also run many open source test suites. Language runtime tests (e.g., Python, Go, etc) are particularly useful. We also perform continuous fuzzing with Syzkaller [2].

> how do you implement features that ostensibly require calls to the OS anyways?

gVisor's kernel is a user-space program, so it can and does make system calls to the host OS. Some examples:

* An application blocks trying to read(2) from a pipe. gVisor ultimately implements blocking by waiting on a Go channel. The Go runtime will ultimately implement this with a futex(2) call to the host OS. * An application reads from a file that is ultimately backed by a file on the host (provided by the Gofer [3]). This will result in a pread(2) system call to the host.

The purpose here isn't to avoid the host completely (that's not possible), but to limit exposure to the host. gVisor can implement all the parts of Linux it does on a much smaller subset of host system calls. Anything we don't use is blocked by a second-level seccomp sandbox around the kernel. e.g., the kernel cannot make obscure system calls, or even open files or create sockets on the host (those operations are controlled by an external agent).

[1] https://github.com/google/gvisor/tree/master/test/syscalls/l...

[2] https://github.com/google/syzkaller

[3] https://gvisor.dev/docs/architecture_guide/overview/

How is this different than a nicerUI over a seccomp filter for your container?
Awesome, thanks. I need to dig into this a little and just run a few demos / labs. This makes sense though. I really like your comments on this thread too (https://news.ycombinator.com/item?id=16976392).