Would you say approaches like gvisor or nabla containers provide more/enough evolution on the security front? Or is there something new on the horizon that excites you more as a prospect?
GVisor basically works by intercepting all Linux syscalls, and emulating a good chunk of the Linux kernel in userspace code. In theory this allows lowering the overhead per VM, and more fine-grained introspection and rate limiting / balancing across VMs, because not every VM needs to run it's own kernel that only interacts with the environment through hardware interfaces. Interaction happens through the Linux syscall ABI instead.
From an isolation perspective it's not more secure than a VM, but less, because GVisor needs to implement it's own security sandbox to isolate memory, networking, syscalls, etc, and still has to rely on the kernel for various things.
It's probably more secure than containers though, because the kernel abstraction layer is separate from the actual host kernel and runs in userspace - if you trust the implementation... using a memory-safe language helps there. (Go)
The increased introspectioncapabiltiy would make it easier to detect abuse and to limit available resources on a more fine-grained level though.
Note also that GVisor has quite a lot of overhead for syscalls, because they need to be piped through various abstraction layers.
I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?
So if processes in gvisor map to processes on the underlying kernel, I'd agree it gives one a better ability to introspect (at least in an easy manner).
It gives me an idea that I'd think would be interesting (I think this has been done, but it escapes me where), to have a tool that is external to the VM (runs on the hypervisor host) that essentially has "read only" access to the kernel running in the VM to provide visibility into what's running on the machine without an agent running within the VM itself. i.e. something that knows where the processes list is, and can walk it to enumerate what's running on the system.
I can imagine the difficulties in implementing such a thing (especially on a multi cpu VM), where even if you could snapshot the kernel memory state efficiently, it be difficult to do it in a manner that provided a "safe/consistent" view. It might be interesting if the kernel itself could make a hypercall into the hypervisor at points of consistency (say when finished making an update and about to unlock the resource) to tell the tool when the data can be collected.
LibVMI-based debug server, implemented in Python. Building a guest aware, stealth and agentless full-system debugger.. GDB stub allows you to debug a remote process running in a VM with your favorite GDB frontend. By leveraging virtual machine introspection, the stub remains stealth and requires no modification of the guest.
> I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?
You don't necessarily need to run a full operating system in your VM. See eg https://mirage.io/
> I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?
> to have a tool that is external to the VM (runs on the hypervisor host) that essentially has "read only" access to the kernel running on the VM to provide visibility into what's running on the machine without an agent running within the VM itself
What I really want is a "magic" shell on a VM - i.e. the ability using introspection calls to launch a process on the VM which gives me stdin/stdout, and is running bash or something - but is just magically there via an out-of-band mechanism.
Not really "out of band", but many VMs allow you to setup a serial console, which is sort of that, albeit with a login, but in reality, could create one without one, still have to go through hypervisor auth to access it in all cases, so perhaps good enough for your case?
You can launch KVM/qemu with screen + text console, and just log in there. You can also configure KVM to have a VNC session on launch, and that ... while graphical, is another eye into the console + login.
(Just mentioning two ways without serial console to handle this, although serial console would be fine.)
Go is not memory safe even when the code has none of unsafe blocks, although through a typical usage and sufficient testing memory safety bugs are avoided. If one needs truly memory-safe language, then use Rust, Java, C# etc.
been out of the space for a bit (though interviewing again, so might get back into it), gvisor at least as the "userspace" hypervisor, seemed to provide minimal value vs modern hypervisor systems with low overhead / quick boot VMs (ala firecracker). With that said, I only looked at it years ago, so I could very well be out of date on it.
Wasn't aware of Nabla, but they seem to be going with the unikernel approach (based on a cursory look at them). Unikernels have been "popular" (i.e. multiple attempts) in the space (mostly to basically run a single process app without any context switches), but it creates a process that is fundamentally different than what you develop and is therefore harder to debug.
while the unikernels might be useful in the high frequency trading space (where any time savings are highly valued), I'm personally more skeptical of them in regular world usage (and to an extent, I think history has born this out, as it doesn't feel like any of the attempts at it, has gotten real traction)
I think many people (including unikernel proponents themselves) vastly underestimate the amount of work that goes into writing an operating system that can run lots of existing prod workloads.
There is a reason why Linux is over 30 years old and basically owns the server market.
As you note, since it's not really a large existing market you basically have to bootstrap it which makes it that much harder.
We (nanovms.com) are lucky enough to have enough customers that have helped push things forward.
For the record I don't know of any of our customers or users that are using them for HFT purposes - something like 99% of our crowd is on public cloud with plain old webapp servers.
"Note that while running within a nested VM is feasible with the KVM platform, the systrap platform will often provide better performance in such a setup, due to the overhead of nested virtualization."
I'd argue then for most people (unless have your own baremetal hyperscaler farm), one would end up using gvisor without kvm, but speaking from a place of ignorance here, so feel free to correct me.
From an isolation perspective it's not more secure than a VM, but less, because GVisor needs to implement it's own security sandbox to isolate memory, networking, syscalls, etc, and still has to rely on the kernel for various things.
It's probably more secure than containers though, because the kernel abstraction layer is separate from the actual host kernel and runs in userspace - if you trust the implementation... using a memory-safe language helps there. (Go)
The increased introspectioncapabiltiy would make it easier to detect abuse and to limit available resources on a more fine-grained level though.
Note also that GVisor has quite a lot of overhead for syscalls, because they need to be piped through various abstraction layers.