Hacker News new | ask | show | jobs
by adiabatichottub 103 days ago
One question I've always had about these capability systems is: why isn't there a way to set capabilities from the parent process when execing? Why trust a program to set its own capabilities? I know that having a process set capabilities on itself doesn't break existing tools, but it seems like if you really wanted a robust system it would make sense to have the parent process, the user's shell for example, set the capabilities on its children, and have those capabilities be inheritable so the child could spawn other processes with the same or fewer capabilities (if it's allowed to do that at all). Is there an existing system that works this way, in or outside of the UNIX family? Or maybe some research paper written on the subject? I'd love to know.
6 comments

You may be interested in OpenBSD's pledge[1][2][3].

> Why trust a program to set its own capabilities?

An example may be that a program starts needing a wide range of capabilties but can then ratchet down to a reduced set once running, aka "privdrop".

> why isn't there a way to set capabilities from the parent process when execing?

There have been replies on other systems so just to stick with pledge which provides the abiliy to set "execpromises" to do this.

[1] https://man.openbsd.org/pledge

[2] https://www.openbsd.org/papers/eurobsdcon2017-pledge.pdf

[3] https://www.openbsd.org/papers/BeckPledgeUnveilBSDCan2018.pd...

I think you're talking about "execpromises"?[1] I'll have to study it a bit.

[1] https://bsdb0y.github.io/posts/openbsd-intro-to-update-on-pl...

I've only really messed with capsicum. You can certainly cap_enter between fork and exec, but depending on exactly what your target does, it's really not simple to do anything meaningful beyond the basic capsicum mode without changes to the program.

The way capabilities usually work is you more or less turn off the usual do whatever you want syscalls, and have to do restricted things through FDs that have the capability to do them. So like, no more open any path, you have to use openat with a FD in your directory of interest. But that requires the program to understand how to use the capabilities and how to be passed them. It's not something that you can just impose.

My understanding of SELinux, is it can be imposed on a program without the knowledge of the program, because it's more or less matching rules for syscalls... rather than giving a restricted FD to use with openat, you restrict the options for open.

I am less sure about the others (capsicum, seccomp) but the threat model for opebsd's pledge is not that you don't trust the process, you do trust the process, otherwise you would not be running it. The threat pledge is trying to solve is where if the process gets corrupted by a malicious agent while it is running the fallout is minimal. Under this threat model the process notifies the kernel to shed capabilities as soon as it no longer needs them. something that can only be done in process.

Openbsd had a neat external syscall sandboxing system at one point (systrace ) it was removed for reasons I don't fully understand. But I think it boils down to "optional security isn't". hard to maintain, problematic, external policies, the first thing you do is disable them (cough selinux cough)

This is essentially what containers are. Bubblewrap / Docker / Podman. I think the primary issue is very few applications on Desktop systems are actually designed with sandboxing in mind unlike say something on a phone.
I'm not terrible familiar with Linux container systems, cgroups and all that, but I have been down the rabbit-hole with FreeBSD's jails, and I definitely wouldn't call them a capabilities system. You can lock down the environment quite a bit, and limit or even virtualize the network stack, but you can't say, "Here process, have your standard IO streams and nothing more. Go forth and compute." The process isn't blind to it's environment. You're still in the same basic UNIX user security model. It's really somewhere between chroot and full virtualization.
A default container seccomp profile will let you do quite a few things but you can use a different profile some json and limit to just a few system calls if you want such as doing IO on open FDs without the ability to open them. I think the runtime opens the FDs before the child process starts and are inherited.
You can mostly do that with Seccomp on Linux (I have no experience with FreeBSD).

Child processes inherit the restrictions from the parent. You can therefore have the parent fork, setup it's rules, then exec. This is exactly how syscall filtering (and a bunch of other lockdowns) are implemented in SystemD

Answering without reading TFA here. But I am familiar with capsicum.

But I am pretty sure you CAN get your capabilities from a patent process using capsicum, since they are just file descriptors.