Hacker News new | ask | show | jobs
by saagarjha 2590 days ago
Reimplementing system calls is non-trivial, especially ones that have complex interactions with others (for example, the system calls related to process management). How do you prevent errors when translating this, and how do you implement features that ostensibly require calls to the OS anyways?
1 comments

For sure, implementing Linux is no easy task, and there is no magic bullet. For compatibility testing, we have extensive system call unit tests [1] and also run many open source test suites. Language runtime tests (e.g., Python, Go, etc) are particularly useful. We also perform continuous fuzzing with Syzkaller [2].

> how do you implement features that ostensibly require calls to the OS anyways?

gVisor's kernel is a user-space program, so it can and does make system calls to the host OS. Some examples:

* An application blocks trying to read(2) from a pipe. gVisor ultimately implements blocking by waiting on a Go channel. The Go runtime will ultimately implement this with a futex(2) call to the host OS. * An application reads from a file that is ultimately backed by a file on the host (provided by the Gofer [3]). This will result in a pread(2) system call to the host.

The purpose here isn't to avoid the host completely (that's not possible), but to limit exposure to the host. gVisor can implement all the parts of Linux it does on a much smaller subset of host system calls. Anything we don't use is blocked by a second-level seccomp sandbox around the kernel. e.g., the kernel cannot make obscure system calls, or even open files or create sockets on the host (those operations are controlled by an external agent).

[1] https://github.com/google/gvisor/tree/master/test/syscalls/l...

[2] https://github.com/google/syzkaller

[3] https://gvisor.dev/docs/architecture_guide/overview/

How is this different than a nicerUI over a seccomp filter for your container?