Hacker News new | ask | show | jobs
by surfer7837 1692 days ago
How can you protect yourself from file upload threats? It's basically the worst possible threat model -- executing complex user input that conforms to a spec that was written 20 years ago by some proprietary company with no security.

Executing everything on an isolated container with no permissions? Audit trial etc/good logging? If someone comes up with an RCE you're basically done for, you can only mitigate it but not completely stop it.

7 comments

If you have to process it at all, do it in a WebAssembly sandbox on the server. Or, alternatively, in a seccomp-secured sandbox that isn't allowed to make any system calls whatsoever, just read data from one file descriptor and write processed data to another.
I've seen companies use Headless Chrome and then WebAssembly to process files. You then lock down the Headless Chrome process. You're then "triple covered"; WebAssembly's limited context, JavaScript engine's limited context, and the Chrome process boundary itself.

This is obviously "expensive" though. Doesn't scale very well.

> This is obviously "expensive" though. Doesn't scale very well.

Unlike this issue then, going by the 1Tbps attack it's reportedly causing...

.... why webassembly?
Yeah, I don't see the value here either. You don't need wasm or chrome or any of that stuff.

Linux itself has several features that can be used to isolate processes, and there are use friendly tools like bwrap [0] that make configuration easy.

It should be entirely possible to sandbox something like ExifTool itself such that it has no network access and is limited to reading and writing files in a particular directory.

https://wiki.archlinux.org/title/Bubblewrap

Several reasons:

- It's a separate interface with a different attack surface than your system, so compared to a locked-down version of the normal syscall API, it provides better defense-in-depth.

- It's designed to be a fully self-contained sandbox, by default. If you're locking down everything but reading and writing previously opened file descriptors, you can build a secure sandbox atop syscalls fairly easily. If you need more nuance than that, WebAssembly seems more likely to remain secure, while syscall sandboxes seem more likely to fail-insecure if you get a detail wrong.

- It seems easier to sandbox otherwise-unmodified code that way. If you have code that needs some access to system resources, I think WebAssembly makes it easier to give it just what it needs and nothing else.

(Also, note that I'm not talking about running in a browser; I'm talking about standalone WebAssembly runtimes like wasmtime.)

The first step is always "don't do it at all". Here is the original commit:

https://gitlab.com/gitlab-org/gitlab-workhorse/-/commit/8656...

It's hard to find a linked detailed requirement for this. I would certainly prefer if GitLab didn't silently mangle uploaded images (not least if I'm working on an EXIF library..).

Bonus points for a commit that includes the words "perl" and "exec" not also having a detailed security review attached.

This seems like a great use case for formal methods. e.g. in this case EXIF removers which are formally verified to not crash and successfully remove the identifying data.

These types of programs are relatively simple, and this is a case where a formal proof is much better than reliability.

Is anyone aware of research on this?

The most straightforward answer is to not process the upload at all, treat it as a binary blob. As for serving it as an image etc. on your site have a strict CSP and turn off mime sniffing (and don't allow SVG uploads as images).
You know, if you to it in a pure Haskell function, you can be assured that the worst it can do is to use too many resources so it kill its own process. If you do it in a Rust function, well, you have no formal guarantees, but you have to get really out of your way to put a vulnerability like that in the code.

What you don't do is pulling an ages old perl codebase to run over complex formats.

If you must Wrangle Untrusted File Formats you should do so Safely:

https://github.com/google/wuffs

I do it inside a systemd nspawn container with a volatile file system, no network, minimal caps.