Hacker News new | ask | show | jobs
by Sagiri 3511 days ago
If a particular compiler specified that casting pointers of wrong alignments causes a segfault, it'd be perfectly acceptable to rely on that behavior. The standard would consider it UB, but that compiler has defined that behavior sufficiently.

Note, though, that a compiler simply doing a particular thing now isn't good enough to specify it in the sense that I mean. The compiler writers would have to explain (in a blog post or the like) the behavior and that they plan to keep that behavior in all future versions.

6 comments

Technically, if a compiler specifies how it handles undefined behavior (as opposed to implementation-defined behavior, which it must define), it stops being a C compiler :-)

I also think it is fairly unlikely that any C compiler will say anything about how it handles undefined behaviour because it would mean it has to generate awfully inefficient code. For example, a compiler could not optimize away most pointer dereferencing code if it promised that dereferencing odd addressses segfaults.

Yes, such checks might add at most a few percent to a normal program's running time, but add in all the other corner cases (int overflow, boundary checks, etc.) that also dat a few percent, amd before you know it your program runs at half the speed it could run at. If you find that acceptable, you shouldn't be writing C in the 21st century.

The standard explicitly permits documented behaviour, even though it doesn't have to, because "undefined" covers that option, along with anything else you care to imagine: http://port70.net/~nsz/c/c11/n1570.html#3.4.3
> If a particular compiler specified that casting pointers of wrong alignments causes a segfault, it'd be perfectly acceptable to rely on that behavior.

This is a great way to make your programs "fun" to port to new platforms with new compilers in terrifyingly subtle ways. I prefer not to recommend this approach to solving specific cases of undefined behavior, although if you happen to disable strict aliasing (with e.g. -fno-strict-aliasing) as an additional layer of defensive paranoia, I'm not necessarily against that.

You can also defensively add a quick test to your program's startup code and unit tests. Startup will take a tiny bit longer, but those porting your code will be thankful if they hit the problem, double so if you manage to emit a useful diagnostic.
That would not work. Since the test may not cause any problems. While the compiler might find more optimzation opportuinities in the real program. (Just as the begining of the function did not have problem, but the for loop had.)

Just don't use undefined behavior.

1. You write a few versions of a program.

2. Users report that it started crashing sometimes in version V.

3. After lots of debugging, you discover an input that reliably crashes your program after 30 minutes.

4. A bit later, you discover that your compiler started compiling function f so that it no longer works with unaligned data/buffers of exactly 8 bytes/whatever.

5. At the start of main, you add a dummy call to f with data that reliably crashes if your compiler decides to do that optimization again.

6. The program has become worse: it now always crashes, independent of its input, but you don't have to wait 30 minutes before finding out. That makes it way less likely that you ship a binary again that has the problem. It also makes it easier to tweak source code/compiler flags/whatever until the problem disappears.

Is that perfect? Absolutely not, it is more something of last resort, but depending on the costs of crashing versus those of sometimes crashing half-way through a run, it can be an improvement.

(this technique also can be used when your code hits compiler bugs)

Well, yeah. If you have any reason to suspect that you will need to run your code on other platforms or compile with other compilers, don't do compiler-specific things. Just do it cross-platform the first time.

Hell, even if you don't think you'll ever need to do that, you should still avoid doing things that are platform or compiler specific.

The thing is, many compilers just assume that undefined behaviour won't happen, without defining any particular behaviour. And testing for something like pointer alignment on architectures that silently allow pointer misalignment is really, really expensive.

That said, you can use -fsanitize=undefined to verify correctness of a program (as far specification is concerned). Just be prepared for it being a bit slow.

It's not possible to determine all possible undefined behavior in a C program. -fsanitize=undefined is best-effort.
I believe you are confusing undefined behavior and implementation defined behavior. Undefined behavior is illegal under all compilers, and all bets are off if you do it. Implementation defined behavior is always legal, but different compilers are allowed to do different things.
I'm not.

Undefined behavior is 'anything goes'. An implementation can choose a particular behavior that you can rely on for a particular case of UB, because, if the only rule is that 'anything goes', it doesn't violate that rule.

I'll admit that compilers don't generally do that - because specifying it could lead to fewer optimizations. But I did say "if", and there's no reason they couldn't do so in principle.

One could imagine a compiler with an extremely strict debug mode that traps on a number of situations that the standard deems undefined behavior via a segfault, in order to help people avoid relying on UB. Again - saying something like "casting misaligned pointers causes a segfault on [system]" would in no way violate a standard that says "casting misaligned pointers can do anything", because segfaulting falls under the umbrella of anything.

I think you're misinterpreting the fact that the results of undefined behavior can be ignored by a compiler for a requirement that it must be ignored by a compiler.

Undefined behavior is not illegal. The compiler can do anything with undefined behavior, including exactly what the author expected.
It is illegal, for any reasonable definition of illegal. See my comment from earlier in the year: https://news.ycombinator.com/item?id=10840497
"Illegal" seems stronger to me than "not strictly conforming", I guess (and so does "well-formed", for that matter). But I think we basically agree.
"If a particular compiler specified that casting pointers of wrong alignments causes a segfault" Sure it does. It says using undefined behaviour can cause daemons fly out of your nose, among other things (including but not limited to segfaulting).
If the semantics of casting pointers of wrong alignments were defined to be demons flying out of your nose, it would be perfectly acceptable to rely on that behavior.
If.

(famous reply from Sparta, Laconia, to Philip II of Macedon)