This seems like a good opportunity to use wasm on the server to sandbox the processing of user provided content. Of course they could also try rewriting in a safer language, but given that this already exists and handles all their content, wasm might be a simple defense in depth protection.
What Dropbox did for this sort of thing is ideal. You spawn a child process that has two file handles piped to/from the parent - stdin, stdout.
That child process does the scary stuff - parsing. Parsing requires zero system calls. Reading to/from the parent requires only read and write, but not open, so they can only read and write to those file descriptors.
And exit.
That's it. Seccomp v1 is trivial to apply, gives 4 system calls, and makes the process virtually useless to an attacker. If you want to get fancy and allow for multithreading you can use seccomp v2 and create your threadpool before you drop privs, and probably add futex and memmap.
You pay a latency cost but the security win is huge.
That's a lot of complicated, non-portable steps, with many subtle semantics that can easily be implemented incorrectly.
Running the code in a Wasm sandbox sounds a whole lot easier and less error prone. You do have to trust the Wasm engine, but nothing else. And you don't need in-depth knowledge of OS security mechanisms.
No one cares about portability on the backend. This is a service - github dictates where it runs. I don't see this as being any more complex or involving any more "subtle semantics" than bringing an entire VM and new compiler target along.
Nothing I mentioned requires knowledge of OS security mechanisms beyond what I've described in my short comment.
One thing that I have come to accept is that if one cares about security, the only path is multiple processes, shared library plugins and background threads are a window waiting to be broken.
I think the point was that you can’t corrupt the containing process, and wasm separates code from data (Harvard arch?) so you don’t get arbitrary code exec. Of course if you process output of the wasm in a trusted environment the compromised wasm could generate something that compromises the host, but the same applies to using separate processes and IPC
I don't know of anyone who claims that programs in web assembly are safe internally.
The security claims are entirely that gaining arbitrary execution inside the wasm sandbox does not give you arbitrary execution in the host.
The benefit of a wasm sandbox over a process sandbox is entirely in the overhead reduction - but that does come at the cost of wasm being generally slower than native compilation (oh tradeoffs we will never escape you)
This can be mitigated by creating a new WASM instance for every job. Even if there is internal corruption, the most it can affect is the output of the single task, nothing else.
That can of course be enough to causes damage, but the attack surface is still much smaller and makes RCE a lot less useful. Especially if capabilities are used to strictly limit the syscall surface for the WASM side (with reference types / interface type resources).
WASM isn't a magical security panacea, but it does offer solutions.
Of course not using languages that are prone to these attacks in the first place is a better fix.
I'm naive when it comes to WASM, but the first thing I thought is "that sounds conceptually the same as spawning a child process." Are there significant differences?
Do you have a POC of such an attack? If true that would mean web browsers would be vulnerable executing wasm because you can intentionally feed it a program with out of bounds access.
I am a C++ fanatic---template metaprogramming is a beautiful thing---but I've come to believe that software that handles untrusted user input should never be written in C or C++. It's too difficult to write correct software by hand, memory safe languages are really the only way.
Does there actually exist any practical way to ensure user input does not cause mischief when authoring C/C++ programs at scale? Are memory-safe languages the only answer?
Much worse than that, even memory-safe languages like (safe) Rust, and the inevitable suggestion of AUTOSAR and so on aren't the answer. To properly answer your demand for a "practical way to ensure user input does not cause mischief" you want a drastically less capable language which cannot even in principle express the programs that should not exist, that's exactly what WUFFS is for.
This sort of bug can't happen in WUFFS because you can't express the idea "corrupt the heap memory" even if you desperately wanted to. The tell-tale sign of such languages is that they are not general purpose languages, because those are able to express a wide variety of stupid things you don't want to do.
Sure, for example a Rust program is allowed to open, read and write files. On Linux one of the files it's allowed to open /proc/self/mem points directly into its own heap, another is a list of what's in that heap and where - it can use these to scrawl on the heap and cause havoc.
That's the price of being a General Purpose programming language. We don't know if you might want to scribble on your own heap, or delete all the files labelled "Important, DO NOT DELETE" or mail a copy of the password database to a throwaway account, and so you can do all those things. Those Linux files pointing into process address space aren't a mistake, I wrote code that needs them (and then I ported it to safe Rust months ago) but with great power comes great box office potential or something like that.
Now you might say, "I'm sure I won't get something so obvious wrong", but the trouble is that's what the people who wrote this GitHub code apparently thought too. Hence I say we should use specialised languages with a deliberately narrower scope where this category of mistake is impossible.
WUFFS as it stands would be pretty exhausting to write a Markdown parser in because WUFFS doesn't believe in strings, at all. But it's already a better fit for this problem than C++ because the worst case scenario can't happen.
Oh interesting, thanks a lot for elaborating. Question: why would a memory-safe Rust program open a sensitive file like /proc/self/mem? Wouldn't that require corrupting memory somehow, which presumably the Rust compiler would be proving impossible? Or are you suggesting the file name might come from user input, and the user input would somehow find an exploit to redirect it to that file? Would that be possible/likely in any way for something like parsing?
Somebody call the Lockpicking Lawyer to shove a paperclip in these "security standards". They're flimsy attempts to excuse still doing something that's a bad idea (programming safety critical software in respectively C and C++) by promising to try harder to achieve the impossible standards needed by humans programming these languages.
And I do mean flimsy. Here's a fun example from a random copy of the AUTOSAR guidelines I found online labelled 17-03. AUTOSAR says if I have two 8-bit signed integers and I add them, that might overflow which is bad. So, what if I simply check that they're both less than 100, no more overflow? "Correct" says the AUTOSAR guide this is apparently OK.
Huh. Signed 8-bit integer. 99 + 99 = -58. This is probably not what the person who purchased your car thought the answer was, I hope whatever accident you just caused isn't fatal.
Integer overflow can happen in Rust, but it's well-defined, not undefined. This helps.
Bounds checking is part of indexing, and so even if an index overflows, the check should happen, and panic.
"impossible" is a strong word, but it would be significantly less likely in Rust. If you did the same thing as you did in C, with unsafe, then it could happen. But there's not a lot of reason to 99.9999% of the time, as it's the more difficult and less ergonomic option.
Thanks. I should look at the code. I thought unsigned int overflow would wrap to zero, which would still be in bounds for a nontrivial array. Maybe they’re freeing the item at that index the first time through the array or something.
I’m in my phone now but they cut two different patches to two different releases, I suspect I linked to one and you the other. Harder to double check that when I’m not at a computer, though that does have far better comments and I should have linked to it, thank you. I basically picked one at random.
Yes. Heap Memory Corruption is a type of memory safety issue that's impossible in Safe Rust. (As usual, this depends on any unsafe code and the compiler being bug-free, but that's supposed to be much easier to prove since the "scope" of things to check for correctness is much reduced).
Rust programs don't call `malloc` directly, so the problem of overflow in malloc size calculation is mitigated by never needing to write such code (Rust programs use something like Vec, which is a safe abstraction that reliably (re)allocates as much as required.)
Rust's lack of implicit numeric conversions pushes authors towards using usize (size_t) for everything. So in Rust you'd be more likely to have a denial of service due to supporting 2^64 columns. If you tried to carelessly use u16 for the number of columns, you'd more likely have an application level bug like incorrect page rendering, or in the worst case a panic (equivalent of an uncaught C++ exception, which may be a program-stopping bug, but not a vulnerability).
Unexpected overflow faults in most modern safe languages (rust, swift, presumably go?) by default - they generally use different operators or functions for when overflow is ok.