| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by coppsilgold 699 days ago

If you think about it virtualization is just a narrowing of the application-kernel interface. In a standard setting the application has a wide kernel interface available to it with dozens (ex. seccomp) to 100's of syscalls. A vulnerablility in any one of which could result in full system compromise.

With virtualization the attack surface is narrowed to pretty much just the virtualization interface.

The problem with current virtualization (or more specifically, the VMM's) is that it can be cumbersome, for example memory management is a serious annoyance. The kernel is built to hog memory for cache and etc. but you don't want the guest to be doing that - since you want to overcommit memory as guests will rarely use 100% of what is given to them (especially when the guest is just a jailed singular application), workarounds such as free page reporting and drop_caches hacks exist.

I would expect eventually to see high performance custom kernels for a application jails - for example: gVisor[1] acts as a syscall interceptor (and can use KVM too!) and a custom kernel. Or a modified linux kernel with patched pain points for the guest.

In effect what virtualization achieves is the ability to rollback much of the advantage of having an operating system in the first place in exchange for securely isolating the workload. But because the workload expects an underlying operating system to serve it, one has to be provided to it. So now you have a host operating system and a guest operating system and some narrow interface between the two to not be a complete clown show. As you grow the interface to properly slave the guest to the host to reduce resource consumption and gain more control you will eventually end up reimagining the operating system perhaps? Or come full circle to the BSD jail idea - imagine the host kernel having hooks into every guest kernel syscall, is this not a BSD jail with extra steps?

[1] <https://gvisor.dev/>