Hacker News new | ask | show | jobs
by fefe23 1054 days ago
This is a step back.

The reason to have this in a separate process is so it can be audited "to death" because the code base is small.

gvisor itself is so big that doing an exhaustive audit is out of the question. Google has mostly switched to fuzzing because the code bases have all become too bloated to audit them properly.

The reason you have gvisor is to contain something you consider dangerous. If that contained code managed to break out and take over gvisor, it is still contained in the kernel level namespaces and still cannot open files unless the broker process agrees. That process better be as small as possible then, so we can trust it to not be compromisable from gvisor.

EDIT: Hmm looks like they aren't removing the broker process, just "reducing round-trips". Never mind then. That reduces the security cost to you not being able to take write access away at run time to a file that was already opened for writing.

1 comments

The reason you can focus auditing on the second process is because you have a security architecture that enables that. Of course the security mechanisms you’re relying on there need to be exercised and occasionally fall apart too (meltdown, MDS, etc.).

Process isolation is not the only tool that you have to build a secure architecture. In this case, capabilities are still being limited by available FDs in the first process (as well as seccomp and the noting namespacing and file system controls), and access to FDs is still mediated by the second process. There is no such thing as “being able to take access away … to a file that was already opened” as this is simple not part of the threat model or security model being provided. You still need to be diligent about these security mechanisms as well.

The idea that Google has given up and just does fuzzing is nonsense. Fuzzing is a great tool, and has become more common and standardized — that’s all. It is being added to the full suite of tools.

As I understand it, the new model is that the process gets an opened fd passed by the broker and can then read and write to it as fd permissions allow.

The old model howevwr was that read and write were translated to rpc calls to the broker. In that model you can take write access away even after you have given it to a process, because you have not actually given it. All writes still go through the broker process.

> The old model howevwr was that read and write were translated to rpc calls to the broker.

In the old model, reads/writes were not translated to RPCs. Only for regular files, the broker was donating FDs to the sentry (userspace kernel) and the sentry was allowed to perform read(2)/write(2) directly. This was done as a performance optimization long back.

What is different with directfs is that now the broker additionally donates FDs for other types of files as well (directories, sockets, etc) and the sandbox is allowed to operate on those FDs with more syscalls like mkdirat, symlinkat, etc. This drastically increases the independence of the sandbox is performing filesystem operations, so it does not need to invoke the broker via RPCs.

As described, the sentry is still constrained to operating on only the container filesystem via namespaces and other Linux security primitives.