Hacker News new | ask | show | jobs
by cookiengineer 1029 days ago
The irony behind it is that one could argue that we are using UNIX wrong, because technically each program should run as its own user with its own groups. Which is what apparmor and firejail/sandboxes kind of want to embrace but in practice people just care too less.
13 comments

Only sounds like "irony" if you don't understand problem.

The problem is not isolation or lack of it. The problem is that app require complex set of permissions for both users files and other apps.

App might want to send notification to notification daemon. But app should not be able to pretend to be another app, whether by name or icon. And good luck trying to stop malicious app from just making same/similar enough icon and spelling Firefox with some fancy UTF characters to go around it.

And that's pretty simple case! And already very hard on kernel/OS level to solve. Now look at files.

You might want to allow graphical editor to open any graphical file, regardless of location.

You might want to allow that same editor to only edit some of them.

But for browser, you might want to allow saving new files, but not editing/rewriting existing ones, because it is not an editor, and should have no business editing the files.

Or, allow browser tab browsing certain URL (say, web image editor) to modify the files, but not the image sharing webpage that only needs to read the file.

Now we not only have insanely granular permissions per app, the different actions from "app" (web browser is basically container for multiple applications at that point) also need different permissions.

It has nothing to do with "unix bad", or "unix wrong", to actually separate the applications without hardships on the user (like fucking with permissions every time one app needs to touch files of another app) is just very very hard

> You might want to allow graphical editor to open any graphical file, regardless of location.

More likely, you want to temporarily give them permission to specific files you indicate. A graphical editor doesn’t have reason to read any file that the user didn’t explicitly picked for editing/viewing.

That’s how Mac OS works nowadays (possibly except for the ‘temporarily’; I don’t know the details): applications can only open files that the user selected in the system file open dialog. That runs in a separate process and opens up an app’s sandbox to allow access to the file the user selected.

That limits your application though. It means you have to use the system file picker. For many apps that might be fine. But it means you can't have something like vim or emacs where you open files with a command. Or have an option that does something like open a sibling .h file when you are editing a .c file. Or search up the directory to find the applicable .editorconfig file.
So why does it work for Mac, Android, and iOS?
In fact, it doesn't work. Both Android and macOS apps will commonly ask for "full filesystem access" permissions for this exact purpose, which sort of defeats the point (for those apps at least). I don't use iOS enough to speak to how this is handled there, but the few times I've had to wrangle some files on there it made me want to smash my head into the wall.
Well, the examples given don't, generally speaking. For stuff like compiling you can do things like have the permission apply to an entire folder, though.
You can do this on basically any modern unix by passing file descriptors over a unix socket: the “graphical editor” server would launch as a user that can’t access anything except a socket and then users would open files by pushing an open fd to the editor over its socket.
This sounds interesting but I don’t understand what the underlying mechanism is. For me, a file descriptor is just an int corresponding to something I can read from and write to and a socket just carries bytes. I don’t understand how an FD can be sent over a socket or, if it can, how that’s anything more than just sending an int?
It's special API that tells kernel to duplicate FD and give it to different process.

https://linux.die.net/man/7/unix

    SCM_RIGHTS
        Send or receive a set of open file descriptors from another process. The data
        portion  contains an integer array of the file descriptors. The passed file 
        descriptors behave as though they have been created with dup(2).
There are few interesting uses like for example, if you want to restart a network server, the old process can send its open, listening socket to the new process and thus achieve seamless switchover.

Other nifty thing with UNIX sockets is that you can just... read which user sent the message and as it is kernel adding that metadata you're 100% sure it came from that user. That's for example how you can set postgresql so say a certain user in the system can log as themselves without having to have a password.

https://en.wikipedia.org/wiki/Unix_domain_socket

In addition to sending data, processes may send file descriptors across a Unix domain socket connection using the sendmsg() and recvmsg() system calls. This allows the sending processes to grant the receiving process access to a file descriptor for which the receiving process otherwise does not have access.[2][3] This can be used to implement a rudimentary form of capability-based security.

I’m not exactly sure of the terminology, but there’s an opaque object corresponding to the int that can be passed between processes via unix sockets. I believe nginx and other web servers do this to transfer open connections to the new server process on restart without interruption.
You can express most of this using the existing capabilities in linux, the issue is that the interfaces you use to do stuff need to change in order to actually make it usable (as opposed to just instantly disabled as soon as it becomes a problem, like apparmor).
While true and actually pretty cool, a comment like this is a pretty good explanation of why we haven’t had widespread adoption of Linux on the desktop. I can imagine the users’ eyes glazing over.
I wouldn’t tell a user this, but developers of the desktop environments and distributions are leaving a lot of the design space unexplored.
Flatpak [1] offers something similar on Linux:

> The FileChooser portal allows sandboxed applications to ask the user for access to files outside the sandbox. The portal backend will present the user with a file chooser dialog.

> The selected files will be made accessible to the application via the document portal, and the returned URI will point into the document portal fuse filesystem in /run/user/$UID/doc/.

[1]: https://docs.flatpak.org/en/latest/portal-api-reference.html...

> Which is what apparmor and firejail/sandboxes kind of want to embrace but in practice people just care too less.

In practice I don't have the time to debug every shitty little app armor integration for weeks. I lost days to libvirt-manager because its app armor support was enforced and not even half assed. Some configuration paths would automatically get whitelisted in its auto generated app armor profiles, others would just get you a file not found until you whitelisted them manually. The process responsible for generating these profiles would also silently kill itself if it encountered a path that was on its internal ban list, have fun debugging that when you do things like using an alternative bios rom, which by default are all stored in a blocked path.

Apparmor feels like security through obscurity, unless you already know that you are dealing with app armor fuckery there is no chance in hell that you will be able to run your application and not being able to run anything is the holy grail of security.

Regarding the last paragraph... Apparmor writes pretty verbose messages visible in journalctl (and in dmesg I think), so it's not really an obscurity

I used libvirt with apparmor and was pretty satisfied with it

the problem is "just that" is not good enough

because programs often to need to have part of the capabilities of the user which started them, just a very well controlled subset of them, something which the UNIX model can't properly represent (through you can hack it on-top of it)

There is also the problem of having by default "owner-user owner-group other" as permission sets for files and executable. This works if others is "other humans" (assuming it does work, security issues on shared systems based on that where not uncommon). But this works much less if you want to protect users from rogue programs because then "other" tends to be far to permissive.

Process owned by human-user fork(2)s and then exec(2)s suid program owned by program-user; program owned by program-user then does most of the work; but calls back over a domain socket to program owned by human-user to get it to do things on the program-user’s behalf.

Picture: local DB client, remote DB server. Server can stream a file to the client for the client to write to disk. “On the same machine, as a different user” is just the trivial case of “over the network.”

This doesn't actually provide the benefit of application isolation though; if the software is malicious or vulnerable the as-user component could be as well. Remember that the biggest use case for application isolation is untrusted applications. Essentially any setuid-based approach to isolation requires a trusted developer using very good practices to remain secure, and that's why it's faded away.
What's insecure about setuid if the setuid user isn't a privileged user? For example, a setuid-nobody program, shouldn't be any more insecure than a systemd service spawned as User=nobody, no?

(Also, implied is that any untrusted logic lives in the spawned program, while the "client" program is simple and auditable. As I said: like a database client vs a database server. Or how about: like a client that wants to print something, vs. a print server embedding untrusted printer drivers!)

like I sayed: hacks
If everything is a file, and files can have permissions, then you can simply allow the "program user" access to those files using groups.
The group model is far too inflexible to make this realistic... A file can only have one group, and people use more than one application. ACLs are available on Linux (although seldom used) and help to address this problem, but the ergonomics are very poor. Since ACLs don't address the issue of syscalls, IPC other than file based, etc., It hasn't really made sense to make them the focus or application isolation efforts. The kernel namespacing and capabilities features are a lot more attractive for this use and are more similar to the historic approach of chroot... But the tools still aren't great.
>A file can only have one group, and people use more than one application

But users can be in multiple groups. You can have files with groups like "graphics, audio" etc. and give access to the application users by adding that user to the relevant groups.

>IPC other than file based

This isn't UNIX model though, is it?

Though I agree with you. Given the current state of programs, file permissions aren't enough for isolation.

That's what Android does: each app runs as a different user.
The issue with implementing that on traditional UNIX systems is that only root can impersonate another user. (Mechanisms such as su/sudo are achieving their goal through a setuid bit, and implement a policy using executable code in user space, which historically hasn't been without its own share of bugs.)

Next problem will be sharing data between programs that legitimately need to do so; if I had an _emacs user that owned my source code, how do I make it non-painful for the _gcc user to read the source and write the resulting executables (which would end up in a directory owned by _emacs)? What about git, various preprocessors/generators, formatters, linters?

You'd have to step out of the traditional UNIX authn/authz model to effectively implement that. It's what various security-focussed OS's have been doing for a while anyway; e.g. OpenBSD implements unveil, which "hides" entire branches of the VFS tree. For example, if git has no business reading or writing files outside of the currently operated on repository, it can restrict itself very early in the process life - before proceeding to perform any of the "tricky" operations that are the common sources of security bugs.

> The irony behind it is that one could argue that we are using UNIX wrong, because technically each program should run as its own user with its own groups.

I think one problem with the UNIX design is that UIDs/GIDs are a flat namespace, and commonly only 32-bits in size (even on 64-bit systems), when what is really needed to meet contemporary requirements is a hierarchy, either with an unlimited number of levels, or at least generous limits. Allow a user to create sub-uids (such as one per an application) and even sub-sub-uids (a web browser might create a sub-sub-uid for each website the user visits).

I think the Windows design of variable-length SIDs is in principle superior to the POSIX approach.

(Although, not necessarily in practice - it isn’t uncommon for Windows to make design decisions which in theory are superior to those of UNIX, but the practical implementation of them is full of warts, backward compatibility hacks, arbitrary limitations, and undocumented black boxes, which end up canceling out a lot of the theoretical advantage.)

Have you heard of user namespaces? They would match all your requirements it seems.
I have but I don’t agree that they do.

From what I understand, Linux user namespaces require you to reserve a UID range for each namespace to be mapped to its parent. Since you only have 32-bits to play with, you are forced to map multiple UIDs in the child namespace to the same UID in the parent, while many security decisions are based on the root user namespace UID only. So this is actually a lot more limiting and inflexible than Windows-style variable length UIDs would be.

> you are forced to map multiple UIDs in the child namespace to the same UID in the parent

Is that really a limit or just a thing for convenience?

I don't think besides 0 in namespace being the actual user in the actual system as a good convenience, that there is any "need" for pids per root-pid, and even if that happened it would save "root-pids".

And I find it unlikely as of now that a system would reach the 16-bit limits of running more that 65000 applications on a single system without hitting some other limit like /proc/sys/kernel/pid_max or /proc/sys/fs/file-max first.

> I don't think besides 0 in namespace being the actual user in the actual system as a good convenience, that there is any "need" for pids per root-pid, and even if that happened it would save "root-pids".

What happens with filesystems though? I would assume the filesystem is using the root user namespace. Which means if you have two different UIDs and they map to the same UID in the root namespace, they get collapsed into one for file ownership/etc. That seems a rather major limitation.

> And I find it unlikely as of now that a system would reach the 16-bit limits of running more that 65000 applications on a single system

With 32-bit identifiers, if you make each level 16-bit, you only have room for two levels. What if you have need for a third?

Also, you have to design a mapping from however many levels you need to the 32-bit flat namespace. A mapping which works well for one use case might turn out to be a problematic limitation in another. With variable-length UIDs there is no mapping to bother with.

> Also, you have to design a mapping from however many levels you need to the 32-bit flat namespace. A mapping which works well for one use case might turn out to be a problematic limitation in another. With variable-length UIDs there is no mapping to bother with.

Yes, this thing is gonna make IPv4 NAT look like a nice thing in comparison.

Yes, it will probably mean horrible kludge mapping of isolated-applications to UIDs, but until you get to 2^15 ~ 2^16 count of isolated-applications it should work fine.

Yes, this will be on a per-system basis, the resulting filesystem will be only useable by your system, and no other system.

What I'm saying is that in theory the "filesystem" and the "UIDs are 32 bit" parts are mostly there. They're there from the multi-user-big-box days not being used (except by Android/Linux).

> With 32-bit identifiers, if you make each level 16-bit, you only have room for two levels. What if you have need for a third?

The main reason why 65536 UIDs and GIDs are often submapped to every user is because POSIX systems often have a hardcoded assumption that user nobody is UID 65534, GID 65534, and if you want to run nested POSIX systems under POSIX systems without too many changes, reserving that many UIDs and GIDs are required.

There's no good place for that universal "nobody" user anyways, and if you're rethinking how the UIDs and GID mechanisms relate to security, definitively no place for a universal nobody, so you might as well map only the required ammount of UIDs/GIDs per isolated-application.

That then leads to leaving unmapped UIDs unmapped on both host and isolated-applications.

Unless you're reaching the 2^15 ~ 2^16 count of isolated-applications then it should work fine.

Another option would be doing what Android (and supposedly flatpak) does: you should not be able to simply run whatever you want if you're a isolated-application. If as a isolated-application you need to run another isolated-application you need to invoke "the platform" via `am` or `flatpak-spawn` and use it to spawn another isolated-application.

> What happens with filesystems though? I would assume the filesystem is using the root user namespace. Which means if you have two different UIDs and they map to the same UID in the root namespace, they get collapsed into one for file ownership/etc. That seems a rather major limitation.

As far as I know most of what can be considered normal Linux filesystems (ext4, btrfs and I think xfs) support said 32-bit UIDs so you would not need to change filesystem code (and I believe changing and bugfixing filesystem code is always a scary proposition) to use a 32-bit mapping.

Nothing prevents you from using only UID/GIDs; there are other security mechanisms that could be used:

* present every isolated-applications with different overlay filesystems visible. So you can have several things read/write to the same places, but every one has it's own view of what is being read/written.

* displaying a entirely different file system for every isolated-application (bindfs as an example)

* every isolated-application has it's SELinux context (labels) or other forms of ACL applied to those files.

But I find this birthday attack scenario dubious, why would you "need" this UID overlap if the isolated-application don't overlap outside of both namespaces?

If they aren't the same isolated-application it's the wrong thing to do and a security risk.

If they do map to the same isolated-application with the same set of data then trusting everything will be fine is a reasonable assumption. It isn't getting any more data or more privileges from being at different UIDs or GIDs in different contexts.

It seems like the obvious solution. Users are protected from one another in Unix, applications need to be protected from each other, therefore applications must be users.
I ran a SaaS for a long time before containerisation, and we would create a new Unix uid for each customer, and run the application instance exclusively under that uid. Coupled with a postgres database instance and properly isolated postgres roles, it felt like a reasonable way to isolate customers from each other.

The problem with this approach is that, of course, it really doesn’t scale easily. Eventually you need multi tenant, and eventually we ended just pushing everything into the database, using row level security and tenant IDs. It worked great but felt more fragile (eg, you can disable RLS)

I’m not an OS expert by any means, but I think ultimately the problem is that we’re using one operating system model for two orthogonal use cases.

I feel like need a well-defined client model - “one user with multiple apps” and a well-defined server model - “one app with multiple users”. But it’s not clear to me how the OS can help with the latter, since it’s going to be domain specific. Maybe Postgres’ model is the right answer after all.

Unix doesn't make it easy for an unprivileged user to switch to a different user account for just one app though. Plus it gets more complicated when your application wants to save something to the disk so it can be accessed by a different application.
"More complicated" but not by much. That's where groups come in.
> in practice people just care too less.

I tried to use Apparmor and SELinux, but how policies work is beyond me. Snap's sandboxing seems to be the closest thing to user-friendly sandboxing, but it's still not that user-friendly.

Maybe on the server/desktop side of things. In embedded Linux the "user per app" scheme is very useful and is embraced.
The issue I have with it is that a lot of living off the land techniques are caused by this false sense of how UNIX user and group management is supposed to work.

I mean, the correct approach would be to have groups even for specific network protocols because capabilities are not enough to sandbox a binary correctly, and the network group is pretty much pointless.

And then there's icmp, which brings us to the ping binary which on lazy distributions still has an SUID flag set, as well as glibc which still allows LD_PRELOAD by default because it is intended functionality from the perspective of its developers.

Most of these privilege escalation exploits can be mitigated, if users and groups and capabilities are managed correctly.

In practice I probably would recommend to use the systemd seccomp sandboxes because most of these quirks have been abstracted away there and are configurable in the service files - like file/folder access, user/group randomization, chrooting, capabilities etc.

That is what Android does. Each application (by default) gets its own user id.
Isn't this what Android does? Every app has it's own user and group and you only get to manipulate "kinda-global-state" thru platform APIs.