Hacker News new | ask | show | jobs
by tines 1566 days ago
I am a C++ fanatic---template metaprogramming is a beautiful thing---but I've come to believe that software that handles untrusted user input should never be written in C or C++. It's too difficult to write correct software by hand, memory safe languages are really the only way.
2 comments

Does there actually exist any practical way to ensure user input does not cause mischief when authoring C/C++ programs at scale? Are memory-safe languages the only answer?
Much worse than that, even memory-safe languages like (safe) Rust, and the inevitable suggestion of AUTOSAR and so on aren't the answer. To properly answer your demand for a "practical way to ensure user input does not cause mischief" you want a drastically less capable language which cannot even in principle express the programs that should not exist, that's exactly what WUFFS is for.

https://github.com/google/wuffs

This sort of bug can't happen in WUFFS because you can't express the idea "corrupt the heap memory" even if you desperately wanted to. The tell-tale sign of such languages is that they are not general purpose languages, because those are able to express a wide variety of stupid things you don't want to do.

Could you expand a bit on what you mean regarding memory-safe Rust not being the answer to "cannot corrupt the heap memory"?
Sure, for example a Rust program is allowed to open, read and write files. On Linux one of the files it's allowed to open /proc/self/mem points directly into its own heap, another is a list of what's in that heap and where - it can use these to scrawl on the heap and cause havoc.

That's the price of being a General Purpose programming language. We don't know if you might want to scribble on your own heap, or delete all the files labelled "Important, DO NOT DELETE" or mail a copy of the password database to a throwaway account, and so you can do all those things. Those Linux files pointing into process address space aren't a mistake, I wrote code that needs them (and then I ported it to safe Rust months ago) but with great power comes great box office potential or something like that.

Now you might say, "I'm sure I won't get something so obvious wrong", but the trouble is that's what the people who wrote this GitHub code apparently thought too. Hence I say we should use specialised languages with a deliberately narrower scope where this category of mistake is impossible.

WUFFS as it stands would be pretty exhausting to write a Markdown parser in because WUFFS doesn't believe in strings, at all. But it's already a better fit for this problem than C++ because the worst case scenario can't happen.

Oh interesting, thanks a lot for elaborating. Question: why would a memory-safe Rust program open a sensitive file like /proc/self/mem? Wouldn't that require corrupting memory somehow, which presumably the Rust compiler would be proving impossible? Or are you suggesting the file name might come from user input, and the user input would somehow find an exploit to redirect it to that file? Would that be possible/likely in any way for something like parsing?
That's because opening and writing to arbitrary files is marked a safe operation in Rust, and sometimes Rust programs open files whose filename were supplied by untrusted, potentially malicious input. And as said in the other comment, this can lead to UB: https://github.com/rust-lang/rust/issues/32670 (it's Linux here that is at fault, really, for offering a hugely unsafe API through the filesystem), so, we could imagine a paranoid (but technically correct) version of Rust where the APIs for opening and/or writing to a file were marked as unsafe.

But, since the operations in actual Rust are marked as safe, the compiler doesn't provide any checks here: we can cause UB in code without any unsafe { }. Moreover, checking if the path starts with /proc isn't enough to make the UB go away: procfs can be mounted on any dir, there can be bind mounts further obscuring the file resolution, etc.

This means that if you really care about memory safety (and correctness in general), the precise way you setup your environment is also critical, down to the minimum details. It's like your Dockerfile had a metaphorical unsafe { } block around it: in a system that doesn't mount /proc you just closed a whole host of bugs, and a buggy system that mounts procfs in other dirs may cause arbitrary havok. (note that mounting procfs is a privileged operation)

There are low level languages that, unlike Rust, completely prevents memory safety errors, like ATS. In ATS you can deal with pointers and pointer arithmetic (like in C or Rust) but to follow a pointer you need to provide a mathematical proof that they are valid. This is enough if we consider the program in isolation, but programs are never run in isolation. A proper mathematical proof of memory safety needs to consider ALL software running in the system, globally: then everything is mathematically verified, and the build step can just reject an unsound system setup.

That way we could theoretically be more precise about our memory safety guarantees: opening and writing to a file is safe, but only if procfs isn't mounted. If procfs is mounted anywhere, then this may go wrong: we need to prove we aren't doing something bad. This means that in a system where sysadmins can just log in and mount random stuff, writing to files must be unsafe!

Of course that's not very practical. It would be cumbersome to prove you're not doing /proc shenanigans every time you messed with files. And arguably, any program that open arbitrary filenames that came from untrusted input is buggy anyway. You should always do filename validations, specially to confine some input to some directory (when applicable), avoiding paths with ../ that escape it, for example. And, any setup that mounts procfs outside of /proc is irreparably broken. We don't have a tool to automatically check for such issues, but if those two things are followed, we won't have UB here.

How to do better than that? We need better system-level APIs, in which operations that are "obviously" safe can really be 100% memory safe all the time.

Yes, the security standards like MISRA and AUTOSAR basically castrate C and C++ into subsets similar to those languages.
Somebody call the Lockpicking Lawyer to shove a paperclip in these "security standards". They're flimsy attempts to excuse still doing something that's a bad idea (programming safety critical software in respectively C and C++) by promising to try harder to achieve the impossible standards needed by humans programming these languages.

And I do mean flimsy. Here's a fun example from a random copy of the AUTOSAR guidelines I found online labelled 17-03. AUTOSAR says if I have two 8-bit signed integers and I add them, that might overflow which is bad. So, what if I simply check that they're both less than 100, no more overflow? "Correct" says the AUTOSAR guide this is apparently OK.

Huh. Signed 8-bit integer. 99 + 99 = -58. This is probably not what the person who purchased your car thought the answer was, I hope whatever accident you just caused isn't fatal.

I agree, but our opinion has zero value for whom calls the shots on such industries.
Is this a vulnerability that would be impossible kn6, let's say, Rust?
This seems to be the patch: https://github.com/github/cmark-gfm/commit/cf7577d2f74289cb8...

Integer overflow can happen in Rust, but it's well-defined, not undefined. This helps.

Bounds checking is part of indexing, and so even if an index overflows, the check should happen, and panic.

"impossible" is a strong word, but it would be significantly less likely in Rust. If you did the same thing as you did in C, with unsafe, then it could happen. But there's not a lot of reason to 99.9999% of the time, as it's the more difficult and less ergonomic option.

Is this unsigned integer overflow? Isn’t that well defined in C++ as well?

Edit: I didn’t research where the corruption comes from in this bug.

Edit again: it looks like the source file is actually C and not C++.

Yep, well-defined in C++, but the resulting out-of-bounds accesses and all that are not well-defined.
Thanks. I should look at the code. I thought unsigned int overflow would wrap to zero, which would still be in bounds for a nontrivial array. Maybe they’re freeing the item at that index the first time through the array or something.
Well defined and it also panics in debug mode. Unit tests tend not to catch these sorts of bugs tbh, but still, nice to have :)
the actual commit fix has some comments that may be useful for understanding:

https://github.com/github/cmark-gfm/commit/ac80f7b56522ffa15...

I’m in my phone now but they cut two different patches to two different releases, I suspect I linked to one and you the other. Harder to double check that when I’m not at a computer, though that does have far better comments and I should have linked to it, thank you. I basically picked one at random.
no worries, it's the same patch to two different releases; you linked to the merge commit & I linked to the fix commit.

source: i cut the releases ;)

Ah ha! I should have realized. Thank you for your hard work, this kind of thing is never easy.
Yes. Heap Memory Corruption is a type of memory safety issue that's impossible in Safe Rust. (As usual, this depends on any unsafe code and the compiler being bug-free, but that's supposed to be much easier to prove since the "scope" of things to check for correctness is much reduced).
Rust programs don't call `malloc` directly, so the problem of overflow in malloc size calculation is mitigated by never needing to write such code (Rust programs use something like Vec, which is a safe abstraction that reliably (re)allocates as much as required.)

Rust's lack of implicit numeric conversions pushes authors towards using usize (size_t) for everything. So in Rust you'd be more likely to have a denial of service due to supporting 2^64 columns. If you tried to carelessly use u16 for the number of columns, you'd more likely have an application level bug like incorrect page rendering, or in the worst case a panic (equivalent of an uncaught C++ exception, which may be a program-stopping bug, but not a vulnerability).

Unexpected overflow faults in most modern safe languages (rust, swift, presumably go?) by default - they generally use different operators or functions for when overflow is ok.