Hacker News new | ask | show | jobs
by alexgartrell 2052 days ago
Can you provide more context on why you feel that's true (or even possible)?

For the last few years, I managed the Container Runtime group at Facebook. My experience has been:

1. `if (has_capability(..., X)) { ... }` gets put into code pretty haphazardly in a way that's not necessarily super well structured. Once it's there, it's ABI, and you're screwed if you want to iterate on it. That's why cap_sys_admin is /almost/ root.

2. If you wanted to do the right thing from the jump (e.g. for bpf itself), you'd have to add a new capability. This is a heavy lift for something that might not actually get any traction. It requires changing a bunch of common tools, and you likely end up breaking a bunch of applications.

3. Debugging capability failures is a pain in the ass. We ended up building and deploying capability tracing infrastructure just to figure out what people are actually using.

4. For gradual roll outs of enforcement/changes, you need the flexibility to warn first, enforce second. We did large scale monitoring of all such changes to make sure we didn't break the workloads.

5. Even if you nail all of the above, the ability to make finer-than-capability-grained decisions (i.e. binding to port 20 or 80 is okay but not port 22) is really valuable.

I'm all for kernel abstractions that just work and solve all problems for all people, but I think the overwhelming trend has been towards kernel interfaces that provide a lot of flexibility and then more opinionated libraries/tools that kind of let us have our cake and eat it to (io_uring => liburing, bpf => libbpf, btrfs => btrfstools).

2 comments

Are we talking about POSIX capabilities or object capabilities?
What POSIX and the linux kernel calls "capabilities" unfortunately result in quite a bit of confusion, which I believe is the cause of your post. POSIX capabilities bear little resemblance to actual capability based security (where a capability is a send/recv-able token that references an object and a set of rights for interacting with that object).
I was not aware of object capabilities -- TIL.

That said, looking at the (apparently) leading implementation, capsicum

> Capsicum also introduces capability mode, which disables (with ECAPMODE) all syscalls that access any kind of global namespace; this is mostly (but not completely) implemented in userspace as a seccomp-bpf filter.

So I do feel that bpf ultimately enables building the kinds of abstractions that people want.