| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jstimpfle 1565 days ago

> Difficult-to-detect undefined behaviour is a significant problem with C and C++, otherwise it wouldn't be a major cause of serious security vulnerabilities in well-resourced, high-profile, security-sensitive C/C++ codebases.

Do you mean difficult-to-detect but "obviously" bugs that can lead to UB, like buffer overflows that only happen in rare circumstances (like with unsanitized input)? Or do you mean difficult to detect unexpected "miscompilations" by the compiler based on some UB that is non-obvious to most programmers and/or not well-known?

Because I was referring to the latter, and in my perception most talk about UB is. I haven't seen the latter happen myself, and I haven't read that many horror stories where this actually happened.

For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect.

Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"? I can see that the line is not well defined here of course, because technically it's all just UB - but the distinction really was my point, which I made from a practical perspective.

1 comments

MaxBarraclough 1560 days ago

Sorry for the late reply here:

> Do you mean difficult-to-detect but "obviously" bugs that can lead to UB [...]? Or do you mean difficult to detect [...] UB that is non-obvious to most programmers and/or not well-known?

I meant both, as I wasn't drawing this distinction.

> I haven't read that many horror stories where this actually happened

You may be right that most of the problematic instances of UB are doing something that's clearly not right even to inexperienced programmers, but again I don't see much value in making the distinction. I mentioned the Chromium stats earlier. Those security issues are real. Many of those issues presumably wouldn't happen in a language that prevented UB.

(It's not necessarily the case that all the issues would have been avoided, as it's possible a bug might still have manifested, but in a well-defined way. A UB-invoking memory-management bug might correspond to a failure to reset a data-structure, in a safe language. Only the former invokes UB, but both could have security consequences.)

> For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect.

You expect it because you know the C and C++ languages don't offer robust means to avoid undefined behaviour. This doesn't generalise to other languages.

In C and C++, silent data-corruption can occur if you try to write into an array but go out-of-bounds. That doesn't happen in many other modern languages. Unlike Ada, C doesn't even let you enable robust runtime bounds-checks for your debug builds. (To my knowledge no C compiler is able to offer this feature.)

> Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"?

I wasn't drawing the distinction at all. I'll use an example from the plainly erroneous category.

Use-after-free and double-free errors are both undefined behaviour, in C and in C++. In Safe Rust, they cannot ever arise because of the way the language is defined, so the door is closed on those forms of runtime error. We don't need to talk about blame.

Memory-management issues of this sort are precisely the kinds of issues that affect Chromium. Not even Google can keep undefined behaviour out of their most high-profile C++ codebase. (In fairness though, the codebase would likely be quite different if written from scratch today in modern C++.)

On the not plainly erroneous side: a subtlety that can result in security consequences is the elision of a memset call by an optimising compiler which can result in sensitive data not getting zeroed out before deallocation. This isn't actually UB, but it's a good example of 'language lawyer' trouble with C/C++. [0]

As an aside: I once spotted a competent C++ programmer (vastly more experienced than myself) writing code which applied memcpy to an array of non-POD objects. If you'll forgive a painful mix of metaphors: C++ is large enough that even experienced programmers can have blind spots to its dark corners.

[0] https://stackoverflow.com/a/56565637/

link

jstimpfle 1559 days ago

> In C and C++, silent data-corruption can occur if you try to write into an array but go out-of-bounds. That doesn't happen in many other modern languages. Unlike Ada, C doesn't even let you enable robust runtime bounds-checks for your debug builds. (To my knowledge no C compiler is able to offer this feature.)

Valgrind offers bounds checks with malloc() allocations and probably other allocations, and while I don't have extensive experience with valgrind it works surprisingly well. I imagine runtime checks could be made possible for any allocator by offering a slice(ptr, type, count) builtin call. I don't really see the language causing any problems, it's more what compiler output and optimizations we've come to expect.

> Memory-management issues of this sort are precisely the kinds of issues that affect Chromium.

They're just what a C programmer expects. I'm not saying you have to like the outcome but this is not "UB" for me - in the sense that it's been utterly obvious from day 1 that this type of issue can cause corruption (because how would it not?), long before there was any talk about UB.

> As an aside: I once spotted a competent C++ programmer (vastly more experienced than myself) writing code which applied memcpy to an array of non-POD objects. If you'll forgive a painful mix of metaphors: C++ is large enough that even experienced programmers can have blind spots to its dark corners.

I'm not defending C++ (in fact I actively avoid having to deal with it) but I think it's obvious that this sort of code is extremely likely to break.

link

MaxBarraclough 1557 days ago

> Valgrind offers bounds checks with malloc() allocations and probably other allocations

Right, and Valgrind is very impressive, but it's a very intrusive and heavyweight tool, rather than a compiler flag. (I often mention Valgrind by name in these sorts of discussions, [0] we're thinking along the same lines here.)

> I imagine runtime checks could be made possible for any allocator by offering a slice(ptr, type, count) builtin call

It would presumably need 'fat pointers', which have ABI issues.

It would need to cope with statically allocated arrays, heap-allocated arrays, and arrays with automatic lifetime, including VLAs and alloca. It would also need to cope with passing a pointer mid-way into an array (something forbidden in, say, Java).

There would be considerable payoff to solving this problem, but it isn't an easy problem to solve, which is why no compiler does so.

> I don't really see the language causing any problems, it's more what compiler output and optimizations we've come to expect.

No, it's the language. The behaviour of modern optimising compilers is consistent with the way the languages is defined. The way the language is defined, and the categories of undefined behaviour that it permits, are very consequential.

I already gave the example of how arrays work differently in Ada, such that Ada compilers can easily add bounds checks for dev builds, whereas C compilers cannot so you need Valgrind, which few people use. (Well, more properly, arrays work differently in C than in just about every other language.) I could give a dozen other examples of ways C permits you to introduce serious bugs into your codebase which other languages robustly defend against.

Again, many categories of bug simply cannot occur in other safer languages, because of the way safe languages are defined. Use-after-free, double-free, signed integer overflow, read-before-write, divide-by-zero, out-of-bound access, mis-aligned access, data-races. Using a language like Safe Rust closes the door on every one of those kinds of undefined behaviour.

Recall that these are precisely the kinds of errors that result in major security vulnerabilities. This isn't purely academic. Safe languages like Safe Rust (or even plain old Java, although Java has some unsafe corners) stops those issues arising, and that means better security.

Also, as we've discussed, the way C is defined makes it difficult for C compilers to use robust compile-time or even run-time checks to detect undefined behaviour. The result is that major C codebases get delivered with undefined behaviour bugs, which are a common source of major security vulnerabilities.

> I'm not saying you have to like the outcome but this is not "UB" for me - in the sense that it's been utterly obvious from day 1 that this type of issue can cause corruption (because how would it not?), long before there was any talk about UB.

Respectfully there is no such thing as 'UB for me'. It's an accepted technical term-of-art with a clear definition. People do PhDs on this topic. [1]

You're not engaging with the points I've made about how other languages are defined in such a way as to prevent these issues arising. You seem to be focussing on how it's the programmer's fault, which isn't the point at all.

Also, operations which cause data-corruption in C aren't always this way in other languages. In Java, an out-of-bounds write (into an array) results in an exception being thrown, for instance. In verified SPARK Ada, out-of-bounds writes cannot arise in the first place.

> long before there was any talk about UB

Prior to the standardisation of the C language that may have been true in the sense that perhaps the term undefined behaviour had not yet been coined, but that doesn't have any bearing on our discussion. C was always an unsafe language.

> I think it's obvious that this sort of code is extremely likely to break

And yet it didn't occur to the experienced C++ programmer. The whole idea of non-POD types isn't obvious to a C++ programmer who started out with C.

To understand C or C++ well you need to essentially know the language spec. You can't simply learn by doing, as you may easily be tricked into thinking that bad code is correct and robust. It isn't obvious that unsigned integer addition overflows by wrapping whereas signed integer addition overflow causes undefined behaviour.

Google's Chrome team are unable to keep undefined behaviour out of their (necessarily large and complex) C++ codebase. It's unlikely that you're smarter than them. Even if you are, undefined behaviour continues to be a problem for real C/C++ codebases, resulting in a steady stream of security issues.

[0] https://news.ycombinator.com/item?id=30580138

[1] https://en.wikipedia.org/wiki/John_Regehr

link

jstimpfle 1557 days ago

> Respectfully there is no such thing as 'UB for me'.

Arguing like that and then saying I'm not "engaging with the points you've made" after you've been talking completely aside the points of my OP, well... you're being a bit of a pain in the butt. I totally get your points, so let's agree that we're just looking for different things.

(Btw, a couple of days ago I did try Rust once again for a few hours, intending to convert a simple toy project to it. After a few hours of fighting the compiler, editing boilerplate files, looking for the right crates for basic Win32 interop, waiting for downloads, etc... I quit without making it work. Nothing changed in my feeling that without an extreme (or potentially infinite) investment of energy, C will continue to be more productive for me personally, for what I do - despite all its flaws).

link