Hacker News new | ask | show | jobs
by thiht 458 days ago
How does the Landlock API compare to mount/network namespaces, as used in Docker containers? As I understand it, namespaces are for isolation, and Landlock would be more like access permissions, is that correct?

Could it be possible for the system to use the Landlock api to catch unauthorized net/fs access by an app and display a popup to ask for authorization, like macOS does?

2 comments

(Landlock reviewer here)

Namespaces can also be used for sandboxing, but they have a series of problems. Most importantly, they require more substantial changes to your program that wants to sandbox itself, and the program has to jump through a series of hoops to get everything into the right state. It is possible, but the resulting program environment is in the end more unusual and the mechanisms for enabling unprivileged namespaces are making it difficult to use it for smaller use cases. (It involves re-execution of the program that wants to sandbox itself, whereas with Landlock, a small program can just install a Landlock policy during an early startup phase and continue with that.)

Controlling the rules through a separate process is not currently possible, but it was proposed earlier this month on the kernel mailing lists:

https://lore.kernel.org/all/cover.1741047969.git.m@maowtm.or...

I think in the upstream kernel LSMs are also still the only way to prevent a process from creating child namespaces where it has privileges?

E.g. if you can cat CAP_NET_ADMIN even within a restricted namespace, you have access to huge amounts of horrbly broken kernel code. It's easy (for people who know how to exploit kernel bugs) to escalate privileges from there.

Distros have their own fixes for this issue so namespaces definitely aren't useless in practice for sandboxing. But the basic mechanism just doesn't that well suited to it.

The user.max_user_namespaces sysctl itself is namespace aware and is used by bubblewrap's --disable-userns option.

But a prctl like NO_NEW_PRIVS would be better, since it could avoid an intermediary namespace that is needed for the namespace-aware sysctl.

Ah I didn't know about that. So you can block the child from creating a userns completely... That seems like an unnecessarily big hammer, but also probably 95% of cases works fine?

I think probably we want an inherited mask of what capabilities you can get in child namespaces. I think I heard someone proposed that upstream but I haven't seen the patches.

NO_NEW_PRIVS is quite irritating in a lot of contexts, since it breaks distant dependencies. For example, you can't run `ping`, so good luck debugging your networking!
> For example, you can't run `ping`, so good luck debugging your networking!

Sending ICMP Echo in userspace (over UDP) is a thing on Linux. From experience, for public Internet, where possible, it is always better to rely on TLS connects (then TCP or UDP, and then ICMP) to ascertain connectivity (lest some middleware meddle with IP or Transport replies).

Great answer, thanks!
Namespaces (used by containers) are very powerful but they are also a door to a large attack surface: https://lwn.net/Articles/673597/

Landlock is (only) an access control system, but it's designed to let any process use it, including potentially untrusted ones, which makes it suitable for any apps. It's close and complementary to seccomp.