Hacker News new | ask | show | jobs
by foxfluff 1563 days ago
> However, there may still be a number of corner cases of the language like undefined behaviors. This is why people frequently fall into C's traps.

I don't think that's true; the UB that is relevant for day-to-day programming is easily learned.

It's just that a sucker is born every minute and there's always someone who's been taught that C is a high level assembler. Now and then someone slips a piece of code that assumes as much in a project and disproportionately big noise is made about it.

Most UB-related bugs in C code are not there because the author didn't know it's UB, but because they simply coded it wrong. People fall into simple buffer overflows, double frees, etcetra all the time and no-one thought it's ok to overflow a buffer.

1 comments

Right. All this buzz about UB is a bit much. Sure, it's not great to have to assume that something breaks when you upgrade your compiler when the compiler authors made a change to exploit some UB that I didn't know of.

On the other hand, I'm quite confident that I don't have much UB in my code (as running sanitizers from time to time confirms). And miraculously, I've never ever been surprised by "nasal daemons".

> All this buzz about UB is a bit much.

No, the problem of undefined behaviour isn't overstated. It's the reason C and C++ have such a poor security track-record.

> I'm quite confident that I don't have much UB in my code

Plenty of C and C++ programmers are confident they don't have UB in their code, and yet we see a constant stream of security issues arising from UB in codebases developed and maintained by the smartest and most motivated C/C++ programmers, such as kernel code and the Chromium codebase.

From the Chromium project: [0]

> Around 70% of our high severity security bugs are memory unsafety problems (that is, mistakes with C/C++ pointers). Half of those are use-after-free bugs.

These bugs are, of course, undefined behaviour. These are serious bugs that presumably made it through their code-review and code-analysis processes.

(Chromium is written in C++ rather than C, but the point stands.)

> running sanitizers from time to time confirms

It doesn't. Modern static analysis tools are pretty smart, but they're a long way from being able to detect every instance of possible UB, with no false positives. If they were, we'd be in a very different place.

With the current state-of-the-art, the only way we have of developing C/C++ codebases that are free of UB, is to use formal methods (e.g. SeL4). This approach is very rarely taken, on account of the impact on development speed, and the skill needed.

[0] https://www.chromium.org/Home/chromium-security/memory-safet...

I feel you've missed my point. I'm not saying that there are no exploitable or otherwise dangerous bugs. Of course there are, but the behaviour usually comes from obvious logic bugs and the consequences are pretty much as expected - e.g. memory gets corrupted as a consequence of an out-of-bounds write, and then almost anything could happen. And while this is "undefined behaviour", it is not what people mean when they complain about UB.
> I'm not saying that there are no exploitable or otherwise dangerous bugs. Of course there are, but the behaviour usually comes from obvious logic bugs

Difficult-to-detect undefined behaviour is a significant problem with C and C++, otherwise it wouldn't be a major cause of serious security vulnerabilities in well-resourced, high-profile, security-sensitive C/C++ codebases.

There may also be many instances of easily-detected undefined behaviour that exist only because of sloppy software development. There may also be many instances of undefined behaviour that are relatively benign.

> and the consequences are pretty much as expected - e.g. memory gets corrupted as a consequence of an out-of-bounds write, and then almost anything could happen.

I don't think I'm seeing your point. We agree that undefined behaviour can have serious consequences.

> while this is "undefined behaviour", it is not what people mean when they complain about UB

I don't follow. Undefined behaviour is an unambiguous term of art in C and C++ programming. There are plenty of common misconceptions about UB, sure enough, but the term itself is precise.

> Difficult-to-detect undefined behaviour is a significant problem with C and C++, otherwise it wouldn't be a major cause of serious security vulnerabilities in well-resourced, high-profile, security-sensitive C/C++ codebases.

Do you mean difficult-to-detect but "obviously" bugs that can lead to UB, like buffer overflows that only happen in rare circumstances (like with unsanitized input)? Or do you mean difficult to detect unexpected "miscompilations" by the compiler based on some UB that is non-obvious to most programmers and/or not well-known?

Because I was referring to the latter, and in my perception most talk about UB is. I haven't seen the latter happen myself, and I haven't read that many horror stories where this actually happened.

For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect.

Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"? I can see that the line is not well defined here of course, because technically it's all just UB - but the distinction really was my point, which I made from a practical perspective.

Sorry for the late reply here:

> Do you mean difficult-to-detect but "obviously" bugs that can lead to UB [...]? Or do you mean difficult to detect [...] UB that is non-obvious to most programmers and/or not well-known?

I meant both, as I wasn't drawing this distinction.

> I haven't read that many horror stories where this actually happened

You may be right that most of the problematic instances of UB are doing something that's clearly not right even to inexperienced programmers, but again I don't see much value in making the distinction. I mentioned the Chromium stats earlier. Those security issues are real. Many of those issues presumably wouldn't happen in a language that prevented UB.

(It's not necessarily the case that all the issues would have been avoided, as it's possible a bug might still have manifested, but in a well-defined way. A UB-invoking memory-management bug might correspond to a failure to reset a data-structure, in a safe language. Only the former invokes UB, but both could have security consequences.)

> For better or worse, I'm not angry with the compiler if I code a logic bug and corruption happens. That's just what I expect.

You expect it because you know the C and C++ languages don't offer robust means to avoid undefined behaviour. This doesn't generalise to other languages.

In C and C++, silent data-corruption can occur if you try to write into an array but go out-of-bounds. That doesn't happen in many other modern languages. Unlike Ada, C doesn't even let you enable robust runtime bounds-checks for your debug builds. (To my knowledge no C compiler is able to offer this feature.)

> Maybe if you're indeed saying the latter is a "major cause of security vulnerabilities", could you provide a few examples where it's the language's or compiler's "fault"?

I wasn't drawing the distinction at all. I'll use an example from the plainly erroneous category.

Use-after-free and double-free errors are both undefined behaviour, in C and in C++. In Safe Rust, they cannot ever arise because of the way the language is defined, so the door is closed on those forms of runtime error. We don't need to talk about blame.

Memory-management issues of this sort are precisely the kinds of issues that affect Chromium. Not even Google can keep undefined behaviour out of their most high-profile C++ codebase. (In fairness though, the codebase would likely be quite different if written from scratch today in modern C++.)

On the not plainly erroneous side: a subtlety that can result in security consequences is the elision of a memset call by an optimising compiler which can result in sensitive data not getting zeroed out before deallocation. This isn't actually UB, but it's a good example of 'language lawyer' trouble with C/C++. [0]

As an aside: I once spotted a competent C++ programmer (vastly more experienced than myself) writing code which applied memcpy to an array of non-POD objects. If you'll forgive a painful mix of metaphors: C++ is large enough that even experienced programmers can have blind spots to its dark corners.

[0] https://stackoverflow.com/a/56565637/