Hacker News new | ask | show | jobs
OpenBSD's new file(1) is now priv-separated (marc.info)
71 points by nayden 4067 days ago
4 comments

Here are the other two related commits:

https://marc.info/?l=openbsd-cvs&m=143014212727213&w=2

https://marc.info/?l=openbsd-cvs&m=143014250427343&w=2

There are unfortunately a lot of people who depend on file(1); and many of them also run it as root.

Also previous HN discussion: https://news.ycombinator.com/item?id=9439778

then why don't they run file(1) on a private copy ?
OpenBSD has yet to successfully program human behaviour.
How easy is privilege separation nowadays? Are there any cross-platform libraries (well, cross-unix at least)?

Most programs I write I would be happy, fairly soon after startup, to drop to "just read and write handles I've already got". It would make me feel much better about my badly written parsers!

Cross-unix no. But pretty much every Linux distribution supports seccomp these days.
Privilege separation is achievable by using separate unprivileged users. If root is required, fork and drop root and then use something like OpenBSD's imsg(3) API to properly pass resources between the privileged parent and unprivileged child processes.

If you want to go a step further and sandbox, use setrlimit(2)/chroot(2). And if it's appropriate, use technologies like systrace(4) or Linux seccomp(2).

There are many examples of this in OpenBSD's base system, including some most people don't know about.. like tcpdump(8)

From my understanding file is a pretty simple program, why does it need to care about privilege separation?
Here are three bug reports for file(1): https://www.freebsd.org/security/advisories/FreeBSD-SA-07%3A... , https://www.freebsd.org/security/advisories/FreeBSD-SA-14%3A... , https://www.freebsd.org/security/advisories/FreeBSD-SA-14%3A... .

Quoting from the first:

> An attacker who can cause file(1) to be run on a maliciously constructed input can cause file(1) to crash. It may be possible for such an attacker to execute arbitrary code with the privileges of the user running file(1). ...

> No workaround is available, but systems where file(1) and other libmagic(3)-using applications are never run on untrusted input are not vulnerable.

And from the third:

> There are a number of denial of service issues in the ELF parser used by file(1). ...

> An attacker who can cause file(1) or any other applications using the libmagic(3) library to be run on a maliciously constructed input can cause the application to crash or consume excessive CPU resources, resulting in a denial-of-service.

From what I remember, it's actually does quite complicated things internally. It does some heuristics on the file content based on some rule to determine the file format. You can also add new file formats this way, there is a 'magic' file you can add somewhere in /usr/share for that (I don't remember exactly where). And there is literally thousands of file types to test. Anyway, if you have ever coded in C you know what might happen with complex-byte manipulation...
It has a history of vulnerability to maliciously constructed input files. It's often used as a validator ("is this file.jpg actually a jpg or an executable?") and/or run as root.
File parsing is a pretty hard problem. Let's take for example a Microsoft .exe file. They all start with the string MZ. However, saying something is an executable file is just the start of the rabbit hole. Is it purely a DOS executable, or is it a windows PE executable? Or is it an OS/2 LE executable, or is it a wrapper around a COFF file created by DJGPP? Okay, now, we know it's a PE file. Is this PE file actually a self extracting archive file created by PKZIP? Or maybe RAR? All of those are also things which have definite headers and offsets that you have to look to, and which the file utility needs to know how to look for, and where things can quickly get complicated.
file is indeed a pretty simple program. Why allow it to do more than it needs to?
It's a configurable parser. Parsers tend to be the hardest thing to get right, hence bugs detected by AFL in FreeBSD's file, in SQL parser in SQLite, etc. A lot of the vulnerabilities in apps dealing with image files come down to parser being buggy. It seems simple until you actually try and implement it.
It depends on how much effort you put in making sure your parser is robust. I ran AFL tests for several days trying to find bugs in Lua parser but AFL kept discovering a way to load a binary chunk. After that, it didn't take to crash on a malformed binary chunk. Priv separation is a good idea even if you trust your parser.
This is true for any piece of code but you wouldn't say that it about, say, httpd (oh, priv separation isn't needed if you put enough effort in making sure your protocol implementation is robust). Plus: file (well, magic) is a configurable parser.
http://en.wikipedia.org/wiki/Halting_problem

Welcome to the joys of parsing.

I think DasIch is saying "why allow file(1) to open sockets, write to arbitrary files, and run external programs"?

Given the correct input, at least a month ago, it could do all of those things.

(I am not sure that attempting to enforce this within the file(1) binary is optimal... after all, even though the attack surface is much reduced, file(1) could still have a bug somewhere prior to the sandboxing. If you could do a "chpriv -write_to_disk -socket -run_external_program /bin/file" that the OS would enforce, that would be cool. Someone should create that.)

> why allow file() to open sockets

If by "open sockets" you mean open existing sockets in read-only mode, it's so that it can identify them as sockets. If by "open sockets" you mean create new sockets, I don't think it does do that:

https://github.com/threatstack/libmagic/search?utf8=%E2%9C%9...

> write to arbitrary files

It appears it only does this if running on OS/2 and investigating what's inside a compressed file. Under these conditions, a temporary file is necessary for platform-specific reasons:

https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...

https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...

It also writes to a non-arbitrary mmapped file (the magic database), because that's how such databases work; you query them by writing to them in a particular way:

https://github.com/threatstack/libmagic/blob/3dea7072b8d7e92...

> run external programs

I can't find any examples where it does that. Do you know of any?

"It may be possible for such an attacker to execute arbitrary code with the privileges of the user running file(1)."

This is what I am saying...given the right input, file(1) could do anything and everything. Yes, it's only due to a bug in file(1), but still that's kind of ridiculous.

We have all sorts of things in place to protect against other bugs (for example, segmentation faults), and there's 27 years of evidence that we need some more help.

why allow file(1) to open sockets, write to arbitrary files, and run external programs

Well, there's not code in file(1) to do that, but there's code that reads data in and makes decisions based on that data. Which means, if your attacker is more careful than the programmer was, you have possibly given that attacker a Turing machine.

I think you meant to link to http://en.wikipedia.org/wiki/Software_bloat

All file needs to do is scan for some magic bytestrings, and optionally print the numbers at a handful of offsets. It currently does much more than that, which is why it's insecure and hard to fix.

You say "needs" as if it's a matter of fact, but what a program needs to do is a very subjective matter.

If you feel the command does more than it needs to, could you call out a few examples of bloated features that you would cut?

Okay, I appear to have misremembered a problem in "strings" as being in "file", where it went overboard in parsing and introduced vulnerabilities.

But I haven't seen anything to disagree with file being similarly problematic. A quote like

To sum up: If somebody uses 'file' in an unconstrained OS environment on untrusted inputs, and he gets pwnd in the result, then it's not a security problem, it's an incompetence problem - and IMO it should be discussed elsewhere.

does not suggest that the program is very well designed.

Scanning for byte strings with no possibility of security flaw is a solved problem.

Separation is done. 1/3 points done: https://news.ycombinator.com/item?id=9441262

I didn't realise openbsd had a seccomp equivalent, but I'm happy it does! (And it did make the news again)

Systrace was added to the base system 10 or 11 years ago (3.6, IIRC), but there was some hesitation among some key team members to use it widely (the argument being that privilege separation utilities themselves will almost always have security problems). That ship seems to have sailed, though, in more recent years.
I think the lack of implementation was that were vulnerabilities in design that I don't think were ever resolved, it simply can't work as the only line of defense. http://undeadly.org/cgi?action=article&sid=20070809201304

It's too bad, I think system calls are a very good place to apply security policies. I think the issue is that one can modify the memory structures pointed to by a system call after it has been "approved" by systrace policy, but before the kernel acts on it. While the ownership of such data structures are in userspace, its perfectly fine to modify such regions.

It's too bad, I think its possibly the most straight-forward approach compared to SELinux or MAC

I've added support for Linux seccomp filtering to my portable tree on GitHub:

https://github.com/brynet/file