Hacker News new | ask | show | jobs
by Roark66 1054 days ago
This article is not very good at explaining what is it they are actually describing. Is directfs just a way to access hosts local fs? If so than my understanding of it is that they used to use rpc to access local fs before (horrible overhead) to sandbox it. Now they've just replaced a part of the operating system filesystem API that resolves paths to file descriptors with their tool so once a file descriptor is obtained the container can talk directly to the fs.

To me this resolves a very narrow use case where you have to run untrusted containers on trusted hosts. This is a very narrow use case. I imagine main target users for this are people that want to offer a service like fargate and run multiple customers on a single host. Why would they want to do that instead of separating customers with VMs? My suspicion is this has something to do with the increasing availability of very energy efficient arm servers that have hundreds of cores per socket. My impression is traditional virtualisation on arm is rarely used (I'm not sure why as kvm supports it, arm since armv8.1 has hw support for it). So "containers to the rescue".

Personally I'd much rather extra security to enable untrusted containers access to the hosts fs is implemented in the container runtime, not as a separate component. Or if the "security issues" it addresses perhaps even in the hosts operating system?

1 comments

> Personally I'd much rather extra security to enable untrusted containers access to the hosts fs is implemented in the container runtime, not as a separate component. Or if the "security issues" it addresses perhaps even in the hosts operating system?

Isn’t that exactly what the original gofer/RPC solution is? The gvisor container runtime operates in userland to ensure that compromises in the runtime don’t result in an immediate compromise of the system kernel.

But running in userland and intercepting syscalls that do IO always has significant performance implications, because all your IO now needs multiple copy operations to get into the reading process address space, because userland process generally can’t directly interact with each other address space (to ensure process isolation), without asking the kernel to come in to do all the heavy lifting.

So if you want fast local IO, you have find a way to allowing the untrusted processes in the container to make direct syscalls, so that you can avoid all the additional high latency hops in userland, and let the kernel directly copy data into the target processes address space.

To magically allow the container runtime to provide direct host fs access itself, with native level performance, that would require the runtime to be operating as part of the kernel. Which is exactly how normal containers work, comes with a whole load of security risks, and is ultimately the reason gvisor exists.