Hacker News new | ask | show | jobs
by CamperBob2 3055 days ago
An explanation of how undefined behaviour is possible would be welcome.

Short summary: compilers that enable strict aliasing assumptions by default can introduce bugs into otherwise-working code by assuming that a pointer to one type will never refer to a variable of another type. This assumption is perilous but considered worthwhile by many, since it allows the compiler to take advantage of early-out optimizations and CPU pipeline scheduling in ways that would otherwise be unavailable at compile time.

More specifically, compilers may make different decisions about the appropriateness of aliasing optimizations depending on the information they have available. If a function that accepts potentially-aliased pointers exists in its own C file, its visibility to the compiler is limited to what the linker can see. So the compiler may not be as aggressive about its aliasing-safety assumptions as it would be if the function and all of its callers were all present in the same file. Under these conditions, a single-header library can result in "riskier" optimizations that the compiler wouldn't attempt if the same code resided in its own translation unit.

My understanding of Sean's position is that this breaches an implicit contract between the programmer and the compiler of a systems-level language. I agree, and I think the standard should have included a keyword -- or the compiler authors a flag -- to allow people to opt in to these sorts of optimizations rather than requiring them to opt out. It is way too late to change the way C works by default by doing stuff like this.

This is an oversimplification but I think it's what the Musl author is getting at. My guess is that if he knew Sean, he'd be a lot slower to accuse him of being ignorant of any particular aspect of what he's doing. Still, while his dismissal amounts to FUD in the absence of any specific examples of "undefined behavior," there are some good points on both sides of the argument.

My own take, which is unfortunately all too easy to back up with specific historical examples, is that it's inappropriate to do anything that makes C/C++ programming more difficult, more error-prone, or less secure than it already is.

1 comments

I agree with the last part; should ask the Musl author(s) to substantiate this assertion.

> [...] the standard should have included a keyword -- or the compiler authors a flag -- to allow people to opt in to these sorts of optimizations rather than requiring them to opt out.

As an amateur Linux kernel hacker, I see hacks in the kernel that circumvent compiler bugs and unexpected behavior because of compiler defiance of standards. The rants on lkml seem to assign most of blame to the compiler authors of gcc. Here is one of Linus' (many) denunciations of gcc:

https://lkml.org/lkml/2003/2/26/158

But also Andrew T: https://stackoverflow.com/a/2771041 who claims - if I understand correctly - that strict-aliasing was already part of the C89/C90 standard but that compiler authors didn't implement the standard correctly.

> It is way too late to change the way C works by default by doing stuff like this.

One thing that I am confused about in your explanation is this:

> Under these conditions, a single-header library can result in "riskier" optimizations that the compiler wouldn't attempt if the same code resided in its own translation unit.

How exactly does a compiler generate "riskier" optimizations from a single-header as opposed to separate translation units? I fail to understand how after the pre-processing phase, this would be less safe.

How exactly does a compiler generate "riskier" optimizations from a single-header as opposed to separate translation units? I fail to understand how after the pre-processing phase, this would be less safe.

For instance, the optimizer might conclude that it's safe to either elide an inline function call or compile it very differently if it sees that you're referring to the same object in two separate parameters.

If the function is implemented in a separate file, the optimizer has to assume that the function does something it doesn't know about.

Thank you. Please indulge me if you have time with two more questions:

1. In some libraries (SQLLite for instance, and libev too I think) the authors have a script that "amalgamates" all sources into a single translation unit.Their reasoning being that a compiler with full-visibility of the source can do global / interprodecural optimization that would not be possible otherwise. Is there any sense in this if it practical to do so for a small to moderate size library?

2. Please tell me what I should read so I can reach the same level of understanding that you have. <not a question>

1. It's almost certain that the speed increase you'd get from intentionally merging a lot of source files into one is less than what you could achieve with a more intelligent profile-based refactoring approach. I wouldn't have a very high opinion of the approach you describe, not knowing any more about those libraries or their authors' motivations.

2. Read a lot of C code written by people like Sean Barrett. You would just pick up a lot of bad habits from mine. :)