Hacker News new | ask | show | jobs
by PeterWhittaker 103 days ago
Interesting article, but it compares apples to a fruit stand: The approach could be improved by comparing Capsicum to using seccomp in the same way.

Sometime ago I wrote a library for a customer that did exactly that: Open a number of resources, e.g., stdin, stdout, stderr, a pipe or two, a socket or two, make the seccomp calls necessary to restrict the use of read/write/etc. to the associated file descriptors, then lock out all other system calls - which includes seccomp-related calls.

Basically, the library took a very Capsicum-like approach of whitelisting specific actions then sealing itself against further changes.

This is a LOT of work, of course, and the available APIs don't make it particularly easy or elegant, but it is definitely doable. I chose this approach because the docker whitelist approach was far too open ended and "uncurated", if you will, for the use-case we were targeting.

In this particular case, I was aided by the fact the library was written to support the very specific use-case of filters running in containers using FIFOs for IPC, logging, and reporting: Every filter saw exactly the same interfaces to the world, so it was relatively easier to lock things down.

Having said that, I wish Linux had a Capsicum-equivalent call, or, even better for the approach I took, a friendlier way to whitelist specific calls.

1 comments

A problem with that approach is that libc can after an upgrade decide to start doing syscalls you were not expecting. Like the first time you call `printf()` it calls `newfstatat()`. Only the first time. Maybe in the future it'll call it more often than that, and then your binary breaks.

I'm not sure what glibc's latest policy is on linking statically, but at least it used to be basically unsupported and bugs about it were ignored. But even if supported, you can't know if it under some configurations or runtime circumstances uses dlopen for something.

Or maybe once you juggle more than X file descriptors some code switches from using `poll()` to using `select()` (or `epoll()`).

My thoughts last time I looked at seccomp: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...

This is a problem but fwiw libc's should be falling back to old system calls. You can block clone3 today and see that your libc will fall back to clone.
Yeah. But it still means wandering into de facto unsupported territory in a way that pledge/unveil/landlock does not.

Your example may be true, but I'm guessing it's not a guarantee. Not to mention if one wants to be portable to musl or cosmopolitan libc. The others inherently are more likely to work in a way that any libc would be "unsurprised" by.

Yeah for sure, it's a real issue. In general, seccomp feels hard to use unless you own your stack top to bottom.
> A problem with that approach is that libc can after an upgrade decide to start doing syscalls you were not expecting.

That would break capsicum, too, so I don’t see how that’s a problem when “comparing Capsicum to using seccomp in the same way”.

That's the approach I meant by "that approach", the library the parent commenter was talking about writing for a customer. Compare this to Landlock or OpenBSDs pledge/unveil.
Now that Landlock actually is a thing, have you considered writing another followup? Given what I've seen of landlock, I expect it'll be spicy...
I took the bait.

“The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock is a stackable LSM [(Linux Security Model)], it makes it possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. ... Landlock empowers any process, including unprivileged ones, to securely restrict themselves.”

https://docs.kernel.org/userspace-api/landlock.html

I've actually found it pretty fine. It doesn't have full coverage, but they have a system of adding coverage (ABI versions), and it covers a lot of the important stuff.

The one restriction I'm not sure about is that you can't say "~/ except ~/.gnupg". You have to actually enumerate everything you do want to allow. But maybe that's for the best. Both because it mandates rules not becoming too complex to reason about, and because that's a weird requirement in general. Like did you really mean to give access to ~/.gnupg.backup/? Probably not. Probably best to enumerate the allowlist.

And if you really want to, I guess you can listdir() and compose the exhaustive list manually, after subtracting the "except X".

I find seccomp unusable and not fit for purpose, but landlock closes many doors.

Maybe you know better? I'd love to hear your take.

I definitely don't know better, and after taking a few more looks at landlock, I'm not even sure what my objections were, probably got it confused with something else entirely. Confusion and ignorance on my part I guess.