|
|
|
|
|
by jstimpfle
1565 days ago
|
|
> Difficult-to-detect undefined behaviour is a significant problem with C and C++, otherwise it wouldn't be a major cause of serious security vulnerabilities in well-resourced, high-profile, security-sensitive C/C++ codebases. Do you mean difficult-to-detect but "obviously" bugs that can lead to UB, like buffer overflows that only happen in rare circumstances (like with unsanitized input)? Or do you mean difficult to detect unexpected "miscompilations" by the compiler based on some UB that is non-obvious to most programmers and/or not well-known? Because I was referring to the latter, and in my perception most talk about UB is. I haven't seen the latter happen myself, and I haven't read that many horror stories where this actually happened. For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect. Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"? I can see that the line is not well defined here of course, because technically it's all just UB - but the distinction really was my point, which I made from a practical perspective. |
|
> Do you mean difficult-to-detect but "obviously" bugs that can lead to UB [...]? Or do you mean difficult to detect [...] UB that is non-obvious to most programmers and/or not well-known?
I meant both, as I wasn't drawing this distinction.
> I haven't read that many horror stories where this actually happened
You may be right that most of the problematic instances of UB are doing something that's clearly not right even to inexperienced programmers, but again I don't see much value in making the distinction. I mentioned the Chromium stats earlier. Those security issues are real. Many of those issues presumably wouldn't happen in a language that prevented UB.
(It's not necessarily the case that all the issues would have been avoided, as it's possible a bug might still have manifested, but in a well-defined way. A UB-invoking memory-management bug might correspond to a failure to reset a data-structure, in a safe language. Only the former invokes UB, but both could have security consequences.)
> For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect.
You expect it because you know the C and C++ languages don't offer robust means to avoid undefined behaviour. This doesn't generalise to other languages.
In C and C++, silent data-corruption can occur if you try to write into an array but go out-of-bounds. That doesn't happen in many other modern languages. Unlike Ada, C doesn't even let you enable robust runtime bounds-checks for your debug builds. (To my knowledge no C compiler is able to offer this feature.)
> Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"?
I wasn't drawing the distinction at all. I'll use an example from the plainly erroneous category.
Use-after-free and double-free errors are both undefined behaviour, in C and in C++. In Safe Rust, they cannot ever arise because of the way the language is defined, so the door is closed on those forms of runtime error. We don't need to talk about blame.
Memory-management issues of this sort are precisely the kinds of issues that affect Chromium. Not even Google can keep undefined behaviour out of their most high-profile C++ codebase. (In fairness though, the codebase would likely be quite different if written from scratch today in modern C++.)
On the not plainly erroneous side: a subtlety that can result in security consequences is the elision of a memset call by an optimising compiler which can result in sensitive data not getting zeroed out before deallocation. This isn't actually UB, but it's a good example of 'language lawyer' trouble with C/C++. [0]
As an aside: I once spotted a competent C++ programmer (vastly more experienced than myself) writing code which applied memcpy to an array of non-POD objects. If you'll forgive a painful mix of metaphors: C++ is large enough that even experienced programmers can have blind spots to its dark corners.
[0] https://stackoverflow.com/a/56565637/