Hacker News new | ask | show | jobs
by fooker 1808 days ago
Invalidating the entire program is theoretically correct, but not really a useful statement.

In practice, weird miscompilations due to UB are just slightly more difficult to debug than your usual segfault. You can generally keep reducing your problem to localize the issue in the code.

Also, such issues are not very common because the value obtained from an UB operation is usually nonsense (shifting past bitwidth, out of bounds array element, etc) so a compiler switching things around is just garbage-in-garbage-out. It is of course a serious issue if a program is actually depending on such a value for a crucial operation. That's how you get exploits, with or without the compiler doing something clever.

3 comments

All I can do is point out that very big names in CS (including Linus) disagree with you: http://www.yodaiken.com/wp-content/uploads/2018/05/ub-1.pdf

This is probably because they’ve seen the effects of the foot gun first hand. They also recall what C/C++ looked like before the standards bodies made exploiting UB in compilers open season. There’s also a reason why Rust doesn’t have any UB in its safe dialect even though it sits on infrastructure capable of most/all the same possible optimizations via LLVM.

There was in fact a direct kernel security issue UB exploitation caused in the Linux kernel whereas without that optimization there is no security issue. if I recall correctly the code looked something like:

    Some_value = ptr->value
    If (!ptr) { return; }
This worked correctly in all cases before the compiler added the optimization. It worked less well after because the !ptr check was UB since it followed a dereference. It also wasn’t immediately obvious this code was broken because there was no diagnostic nor any indication that upgrading a compiler would suddenly elide the check. The value-add of such optimizations at scale vs the correctness issues exploiting it causes is questionable.

The problem wasn’t that it’s not fixable. It’s that it takes time to find and you may not even find it until after it’s being exploited.

That code is buggy if ptr can be null. There isn't an especially rigorous distinction between a bug and a vuln. So saying "this was definitely a safe bug and the compiler elevated it to a vuln" can be a thing in very specific situations, but it also isn't generalizable in any way.

Linus is an important figure, but there are plenty of world experts on the other side of this argument. And there is even a third side that states that not only should the compiler avoid optimizations based on UB assumptions, C implementations should instead function more like a virtual machine for a PDP-11 and have everything under the sun be defined (at great cost of performance).

If you take an especially pessimistic view from the compiler, then really really basic stuff is almost impossible. A write to a dereferenced pointer is almost equivalent to a full program havoc. This invalidates almost all facts that the compiler knows and prevents virtually every optimization near that write. Heck, technically this can interfere with things like vtables and invalidate any sort of devirtualization, which is hugely valuable for performance.

No, unfortunately in kernel land this isn't buggy. It's perfectly "valid" to dereference null. You'll read garbage but it won't panic & the null check on the following line prevents any issues (assuming the compiler isn't exploiting UB optimizations).

Sure. I agree with you in principle it's cleaner to have the nullptr check before. However, it's also important to remember that UB & the compiler optimizing around it is a relatively "new" phenomenon & the compiler going out of its way to punish you for it (for a perf improvement that likely doesn't matter in many/most cases) seems punative & not helpful to most C/C++ codebases (with the exception of some heavy math or HFT applications that could probably be better suited with their own language/dialect). IIRC the standard relaxed to allow this in C99 & the consequences weren't well understood until about 10-15 years later once compiler authors realized what the standard was allowing them to do. This is also the most innocuous case - there's plenty of even more subtle issues that UB can cause.

While I agree there can be nuance in viewpoints, I'm not sure what the 3 you are talking about are. Generally there's the pro-UB & optimizing assumptions around it which is largely populated by compiler authors & "performance at all costs". I'd say Chris Lattner, Linus & DJ Bernstein hold a largely similar point of view of UB = bad decision by the C/C++ standards bodies & probably only differ on the solution (Chris = "switch to a different language", Linus = "give me back the behavior before UB was introduced", DJB = "define all current UB").

Personally, while I'm a fan of performance, I'm not convinced that UB has proven enough of a performance gain that it's not better as sitting behind a flag like `ffast-math` - most software would benefit from removing most of the UB optimizations & it's not clear to me that they performance impact would be all that significant.

One of my least favorite bugs came from a loop that iterated from *p to *(p+n) (exclusive). For n=0 and values of p other than null, the loop body never executes. For n=0 and p=null, the loop body is allowed to execute, because null+0 is allowed to be any value.
Nice example! Seems solvable by printf debugging though..
The fact that it’s fixable doesn’t mean it’s fun…
Miscompilations due to UB are normally silent and can remove checks written into the code for security purposes; e.g. checks for overflowing signed integers, or for null pointers.

They're not harder to debug. Debugging them isn't the issue. The problem is knowing that the code in the editor doesn't correspond to the code under execution.