|
|
|
|
|
by saagarjha
2590 days ago
|
|
Reimplementing system calls is non-trivial, especially ones that have complex interactions with others (for example, the system calls related to process management). How do you prevent errors when translating this, and how do you implement features that ostensibly require calls to the OS anyways? |
|
> how do you implement features that ostensibly require calls to the OS anyways?
gVisor's kernel is a user-space program, so it can and does make system calls to the host OS. Some examples:
* An application blocks trying to read(2) from a pipe. gVisor ultimately implements blocking by waiting on a Go channel. The Go runtime will ultimately implement this with a futex(2) call to the host OS. * An application reads from a file that is ultimately backed by a file on the host (provided by the Gofer [3]). This will result in a pread(2) system call to the host.
The purpose here isn't to avoid the host completely (that's not possible), but to limit exposure to the host. gVisor can implement all the parts of Linux it does on a much smaller subset of host system calls. Anything we don't use is blocked by a second-level seccomp sandbox around the kernel. e.g., the kernel cannot make obscure system calls, or even open files or create sockets on the host (those operations are controlled by an external agent).
[1] https://github.com/google/gvisor/tree/master/test/syscalls/l...
[2] https://github.com/google/syzkaller
[3] https://gvisor.dev/docs/architecture_guide/overview/