| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hxhxhrra 1306 days ago

As others are pointing out, the C standard does allow this. There is no safe way to check for undefined behavior (UB) after it has happened, because the whole program is immediately invalidated.

This has caused a Linux kernel exploit in the past [1], with GCC removing a null pointer check after a pointer had been dereferenced. Null pointer dereferences are UB, thus GCC was allowed to remove the following check against null. In the kernel, accessing a null ptr is technically fine, so the Linux kernel is now compiled with -fno-delete-null-pointer-checks, extending the list of differences between standard C and Linux kernel C.

[1]: https://lwn.net/Articles/342330/

7 comments

pjc50 1306 days ago

> because the whole program is immediately invalidated.

The problem is the program isn't invalidated, it's compiled and run.

The malicious compiler introducing security bugs from Ken Thompson's "Reflections on Trusting Trust" is real, and it's the C standard.

I will grant that trying to detect UB at runtime may impose serious performance penalties, since it's very hard to do arithmetic without risking it. But at compile time? If a situation has been statically determined to invoke UB that should be a compile time error.

Also, if an optimizer determines that an entire statement has no effect, that should be at least a warning. (C lack C#'s concept of a "code analysis hint" which have individually configurable severity levels).

tpush 1306 days ago

> If a situation has been statically determined to invoke UB that should be a compile time error.

That's simply not how the compiler works.

There is (presumably, I haven't actually looked) no boolean function in GCC called is_undefined_behavior(). It's just that each optimization part of the compiler can (and does) assume that UB doesn't happen, and results like the article's are then essentially emergent behavior.

See also: https://blog.regehr.org/archives/213

MichaelBurge 1306 days ago

C++ bans undefined behavior in constexpr, so you can force GCC to prove that code has no undefined behavior by sprinkling it in declarations where applicable:

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...

account42 1306 days ago

Constant-evaluated expressions with undefined behavior are ill-formed but constexpr annotated functions which may in some invocations result in undefined behavior are not.

benj111 1306 days ago

It is undefined behaviour if I write GCC --hlep

Does that mean it's acceptable for GCC to reformat my hard drive?

Just because something is UD doesn't give anyone a license to do crazy things.

If I misspell --help I expect the program to do something reasonable. If I invoke UD I still expect the program to do something reasonable.

Removing checks for an overflow because overflows 'can't happen' is just crazy.

UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself. A compiler undermining that is a bug and user hostile.

tpush 1306 days ago

> It is undefined behaviour if I write GCC --hlep

No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations.

> UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself.

I think you're confusing UB with unspecified and implementation defined behavior. It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

formerly_proven 1306 days ago

This has come up before, because, in some technical sense, the C standard does indeed not define what a "gcc" is, so "gcc --help" is undefined behavior according to the C standard, because the C standard does not define the behavior. By the same token, instrument flight rules are undefined behavior.

A slightly less textualist approach to language recognizes that when we talk about C and UB, we mean behavior, which is undefined, of operations otherwise defined by the C standard.

tpush 1306 days ago

I think this is confusing undefined behavior with behavior of something that is undefined. And either way, the C standard explicitly applies to C programs, so even this cute "textualist" interpretation would be wrong, IMO.

benj111 1306 days ago

Do you know what a metaphor is?

No GCC --hlep isn't in the c standard.

But it is a simple example to illustrate how programs react when it receives something that isn't in the spec. GCC could do anything with Gcc --hlep just like it could do anything with INT_MAX + 1. That doesnt mean that all options open to it are reasonable.

If I typed in GCC --hlep I would be reasonably pissed that it deleted my hard drive. You pointing out that GCC never made any claims about what would happen if I did that doesn't make it ok.

If you come across UD, there's reasonable and unreasonable ways to deal with that. Reformatting your hard drive which is presumably allowed by the C standard isn't reasonable. I would contend that removing checks is also unreasonable.

benj111 1306 days ago

>No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations

What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

>I think you're confusing UB with unspecified and implementation defined behavior

Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

>It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

C gives you enough rope to hang yourself. It isn't required for GCC to tie the noose and stick your head in it though.

I think you're confusing UB with unspecified and implementation defined behavior

tpush 1306 days ago

> What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

I honestly don't understand the point of this paragraph.

> Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

Yes, you are confused about that. UB is precisely the kind of behavior where the C standard deemed it unsuitable to define as implementation defined or whatever, and it usually has really good reasons to do so. You could look them up instead of asking rhetorically.

> I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen. The ability to do that is fundamentally the point of UB. Situations like in the article are not a specific act of the compiler to screw you in particular, but an emergent result.

Additionally, I think you you're also confusing Undefined Behavior with 'behavior of something that is undefined'. These are not the same things.

radford-neal 1306 days ago

> It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm glad I don't live in your country, where the C standard has been incorporated into law, making it illegal for compiler writers to do things that are helpful to programmers and end users, but aren't required by the standard.

roblabla 1306 days ago

> UD is supposed to allow C to be implemented on different architectures

No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures. In those cases, the implementation must define the behavior itself, and stick with it. UB, on the other hand, exists for compiler authors to optimize.

If you want to be mad at someone, be mad at the C standard for defining so much stuff as UB instead of implementation-defined behavior. Integer overflow should really be implementation-defined instead.

benj111 1306 days ago

>No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures.

Is it? We're talking about integer overflow here.

I wasn't in the meetings when writing all the c standards. I'm not convinced this is purely an optimisation thing though.

I would guess the story is more.

Interested party X: "can integer overflow do X?"

Party Y: "no because our processor doesn't work like that.

Party Z: "and it breaks K and R"

Party X: "how about implementation defined?"

Party A: "but our compiler targets 5 different processors"

Party B: "plus that precludes certain optimisations"

astrange 1306 days ago

Not only to optimize but to write safety tools. If you defined all the behavior, and then someone used some rare behavior like integer overflow by accident, it'd be harder to detect that since you have to assume it was intentional.

flohofwoe 1306 days ago

> UB, on the other hand, exists for compiler authors to optimize.

Was this really the original reason why there's UB in the C standard, or has this been retconned by 'malicious compiler authors'? ;)

sanxiyn 1306 days ago

It is the original reason. For example, register allocation is possible because stack smashing is UB.

fps_doug 1306 days ago

But then again, UB doesn't mean the compiler author can't treat it as implementation-defined and do something reasonable.

tuyiown 1306 days ago

You're getting it backward. UB doesn't immediately stop compilation only due to implementation defined backward compatibility, just because you don't want to break compilation of existing programs each time the compiler converges to the C spec and identified an implementation of undefined behavior.

And since you want some cross-compiler compatibility, you also import's third parties implementation defined UB.

This is not some conceptual reasonable decision, the proper way would be to throw out compilation on each UB behavior. The reality is that the proper way would be too harsh on existing codebase, making people use a less strict compiler or not updating version, which are non-desirable effects for compilers writers.

dzaima 1306 days ago

And sanitizers that throw warnings on undefined behavior do indeed exist.

hoseja 1306 days ago

>UB, on the other hand, exists for compiler authors to optimize.

s/exists for/has been exploited by/g

The worst part is the optimizations aren't even that significant. (I recall a blog post of somebody testing this but I can't find it rn)

tmtvl 1306 days ago

It is undefined behaviour if I write GCC --hlep

Well no, it's a compilation error, you need at the very least a semicolon after hlep and from there on it depends on what GCC is. If it's a function you need parentheses around --hlep, if it's a type you need to remove the --, if it's a variable you need to put a semicolon after it,...

Because GCC is all-caps I'm guessing it's a macro, so here's an example of how you could write it (though it won't be UB): https://godbolt.org/z/dYMddrTjj

benj111 1306 days ago

I'm not sure if you're supporting my pov by showing the absurdity of the other position???

Yeah sure, if my phone auto incorrects gcc to GCC then that is technically meaningless so you're completely free to interpret my comment how you want.

..... Although..... GCC stands for GNU Compiler Collection so it can be reasonably capitalised, so maybe then, rather than saying anything goes we should do something reasonable because then you aren't left saying something really stupid if you're wrong???

gpderetta 1306 days ago

Parent point is when the standard talks about UB it refers about translating C code. So parent cheekly interpreted your comment about command line flags (which are outside the remit of the standard) as code instead. I thought it was fitting.

dzaima 1306 days ago

The example here doesn't have compile-time known undefined behavior though; as-is, the program is well-formed assuming you give it safe arguments (which is a valid assumption in plenty of scenarios), and the check in question is even kept to an extent. Actual compile-time UB is usually reported. (also, even if the compiler didn't utilize UB and kept wrapping integer semantics, the code would still be partly broken were it instead, say, "x * 0x1f0 / 0xffff", as the multiplication could overflow to 0)

The problem with making the compiler give warnings on dead code elimination (which is what deleting things after UB really boils down to) is that it just happens so much, due to macros, inlining, or anything where you may check the same condition once it has already been asserted (by a previous check, or by construction). So you'd need some way to trace back whether the dead-ness comes directly from user-written UB (as opposed to compiler-introduced UB, which a compiler can do if it doesn't change the resulting behavior; or user-intended dead code, which is gonna be extremely subjective) which is a lot more complicated. And dead code elimination isn't even the only way UB is used by a compiler.

nwellnhof 1306 days ago

> also, even if the compiler didn't utilize UB and kept wrapping integer semantics, the code would still be partly broken were it instead, say, "x * 0x1f0 / 0xffff", as the multiplication could overflow to 0

That's the most important point! You simply cannot detect overflow when multiplying integers in C after the fact. This is not GCC's fault.

I agree that some of the optimizations exploiting UB are too aggressive, but the article presents a really bad example.

Someone 1306 days ago

> If a situation has been statically determined to invoke UB that should be a compile time error.

But you typically can’t prove that. There’s lots of code where you could prove it might happen at runtime for some inputs, but proving that such inputs occur would, at least, require whole-program analysis. The moment a program reads outside data at runtime, chances are it becomes impossible.

If you want to ban all code that might invoke it it boils down to requiring programmers to think about adding checks around every addition, multiplication, subtraction, etc. in their code, and add them to most of them. Programmers then would want the compiler to include such checks for them, and C would no longer be C.

If, as you seem to say, you want to ban a subset that’s easily provable, I think enabling all warnings already does that. See for example https://clang.llvm.org/docs/DiagnosticsReference.html#wargum..., https://clang.llvm.org/docs/DiagnosticsReference.html#warray..., https://clang.llvm.org/docs/DiagnosticsReference.html#winteg... , https://clang.llvm.org/docs/DiagnosticsReference.html#wcompa...

andrewaylett 1306 days ago

C will accept every valid program, at the cost of also accepting some invalid programs. Rust will reject every invalid program, at the cost of also rejecting some valid ones.

("unsafe" (aka "trust me" mode) means that's not quite true, and so do some of the warnings and errors that you can enable on a C compiler, but it's close enough)

pjc50 1306 days ago

> But you typically can’t prove that. There’s lots of code where you could prove it might happen at runtime for some inputs, but proving that such inputs occur would, at least, require whole-program analysis. The moment a program reads outside data at runtime, chances are it becomes impossible.

No, I specifically ruled out doing that in my comment.

I was referring to the situation where a null check was deleted because the compiler found UB through static analysis.

(Or specifically, placing a null check after a possibly-null usage. It is wrong to assume that after possibly-null usage the possibly-null variable is definitely-null.)

dwattttt 1306 days ago

As I recall, the compiler didn't know it had found undefined behaviour. An optimisation pass saw "this pointer is deferenced", and from that inferred that if execution continued, the pointer can't be null.

If the pointer can't be null, then code that only executes when it is null is dead code that can be pruned.

Voila, null check removed. And most relevantly, it didn't at any point know "this is undefined behaviour". At worst it assumed that dereferencing a null would mean it wouldn't keep executing.

hoseja 1306 days ago

It removed redundant check then, why not warn about that? gcc -Wpedantic even warns about empty statements fcol.

joosters 1306 days ago

The compiler didn't find UB. What it saw was a pointer dereference, followed by some code later on that checked if the pointer was null.

Various optimisation phases in compilers try to establish the possible values (or ranges) of variables, and later phases can then use this to improve calculations and comparisons. It's very generic, and useful in many circumstances. For example, if the compiler can see that an integer variable 'i' can only take the values 0-5, it could optimise away a later check of 'i<10'.

In this specific case, the compiler reasoned that the pointer variable could not be zero, and so checks for it being zero were pointless.

caf 1305 days ago

Yes - and the original post here is the same:

    if (x < 0)
        return 0;

The compiler now knows x's possible range is non-negative.

    int32_t i = x * 0x1ff / 0xffff;

A non-negative multiplied and divided by positive numbers means that i's possible range is also non-negative (this is where the undefinedness of integer overflow comes in - x * 0x1ff can't have a negative result without overflow occurring).

    if (i >= 0 && i < sizeof(tab)) {

The first conditional is trivially true now, because of our established bounds on i, so it can just be replaced with "true". This is what causes the code to behave contrary to the OP's expectations: with his execution environment in the overflow case we can end up with a negative value in i.

sokoloff 1306 days ago

It is probably more precise to say “if the pointer is null, then it doesn’t matter what I do here, so I am permitted to eliminate this” than to say that it can’t be null here. (It can’t be both null and defined behavior.)

joosters 1306 days ago

I'm not sure that's right. The compiler isn't tracking undefined behaviour, it is tracking possible values. It just happens that one specific input into determining these values is the fact "a valid program can't dereference a null pointer", so if the source code ever dereferences a pointer, the compiler is free to reason that the pointer cannot therefore be null.

In essence, the compiler is allowed to assume that your code is valid and will only do valid things.

andrewaylett 1306 days ago

Consider function inlining, or use of a macro to for some generic code. For safety, we include a null check in the inlined code. But then we call it from a site where the variable is known to not be null.

The compiler hasn't found UB through static analysis, it has found a redundant null check.

xenadu02 1306 days ago

> I was referring to the situation where a null check was deleted because the compiler found UB through static analysis.

You can say that but in practice -Onone is fairly close to what you're asking for already. Most people are 100% unwilling to live with that performance tradeoff. We know that because almost no one builds production software without optimizations enabled.

The compiler is not intelligent. It just tries to make deductions that let it optimize programs to run faster. 99.999% of the time when it removes a "useless" null check (aka branch that has to be predicted and eat up branch prediction buffer space and bloats up the number of instructions) it really is useless. The compiler can't tell the difference between the useless ones and security critical ones because all of them look the same and are illegal by the rules of the language.

Even if you mandate that null checks can't be removed that doesn't fix all the other situations where inserting the relevant safety checks have huge perf costs or where making something safe reduces to the halting problem.

FWIW I agree that the committee should undertake an effort to convert UB to implementation-defined where possible... for example just mandate twos complement integer representations and make signed integer overflow ID.

To illustrate the complexity: most loops end up using an int which is 32-bit on most 64-bit platforms so if you require signed integer wrapping that slows down all loops because the compiler must insert artificial checks to make the 64-bit register perform 32-bit wrapping and we can't change the size of int at this point.

caf 1305 days ago

FWIW I agree that the committee should undertake an effort to convert UB to implementation-defined where possible... for example just mandate twos complement integer representations and make signed integer overflow ID.

To accomodate trapping implementations you'd have to make it "implementation-defined or an implementation-defined signal is raised" which it happens is exactly the wording for when an out-of-range value is assigned to a signed type. In practice it means you have to avoid it in your code anyway because "an implementation-defined signal is raised" means "your program may abort and you can't stop it".

gpderetta 1306 days ago

But again, the compiler did not find UB through static analysis. The compiler inferred that the pointer could not be null and removed a redundant check.

For example you would you not expect a compiler to remove a redundant bound check if it can infer that an index can't be out of range?

vintermann 1306 days ago

The compiler made a dangerous assumption that the standard permits ("the author surely has guaranteed, through means I can't analyze, that this pointer will never be null").

Then it encountered evidence explicitly contradicting that assumption (a meaningless null check), and it handled it not by changing its assumption, but by quietly removing the evidence.

> For example you would you not expect a compiler to remove a redundant bound check if it can infer that an index can't be out of range?

If it can infer it from actually good evidence, sure. But using "a pointer was dereferenced" as evidence "this pointer is safe to dereference" is comically bad evidence that only the C standard could come up with.

gpderetta 1306 days ago

> using "a pointer was dereferenced" as evidence "this pointer is safe to dereference" is comically bad evidence

Do you think the compiler would be right to remove the second check here?

   if (!x) std::abort();
   if (!x) return;
   ... = *x;

What about changing std::abort with the following?

   [[noreturn]] void my_abort();

How's that different form a check after dereferencing a pointer? In both cases the check can be removed because dataflow or control flow analysis.

What if my_abort returns instead? Or another thread changes x after the fact?

vintermann 1306 days ago

You typically can't prove it, but if and when you can prove it, you should definitively warn about it or even refuse to compile.

Things like that meaningless null check mentioned, can definitively be found statically (the meaningless arithmetic sanity check in OP's example, I'm not so sure, at least not with C's types).

Someone 1306 days ago

So, how much effort should the standard require a compiler to make for “if and when you can prove it”? You can’t, for example, reasonably require a compiler to know whether Fermat’s theorem is true if that’s needed to prove it.

There are languages that specify what a compiler has to do (e.g. Java w.r.t. “definite assignment” (https://docs.oracle.com/javase/specs/jls/se9/html/jls-16.htm...)), and thus require compilers to reject some programs that otherwise would be valid and run without any issues, but C chose to not do that, so compilers are free to not do anything there.

vintermann 1306 days ago

Everyone wants to drag nontermination into this, but in the OP's example, the compiler already had proof that a the conditional would never evaluate to true. What you can or can't prove in the bigger picture isn't so interesting when we already have the proof we need right now.

It's just that it used this proof to remove the conditional evaluation (and the branch) instead of warning the user that he was making a nonsensical if statement.

So to the question of "when can we hope to do it" the answer is, "not in all cases, sure, but certainly in this case".

eru 1306 days ago

> The problem is the program isn't invalidated, it's compiled and run.

Anything can happen with undefined behaviour, including exactly what you would expect to happen for five years, and then everything breaks.

Compiling and running as if nothing is amiss is exactly how UB is allowed to look like.

pjc50 1306 days ago

> Compiling and running as if nothing is amiss is exactly how UB is allowed to look like.

Yes, and this is a "billion-dollar mistake" that's responsible for an ongoing flow of CVEs.

(the proposal to replace "undefined" with "implementation-defined" may be the only way of fixing this, and that gets slightly easier to do as the number of actively maintained C implementations shrinks)

imtringued 1306 days ago

Create a Defined-C dialect.

eru 1305 days ago

You can already do that to some extent. There's tons of compiler flags that make C more defined. Eg both clang and gcc support `-fno-strict-overflow` to define signed integer overflow as wraparound according to two's complement.

kazinator 1306 days ago

> and it's the C standard.

No, it's puerile interpretations of the C standard learned from the comp.lang.c newsgroup and such places rather than from engineering work.

xeeeeeeeeeeenu 1306 days ago

>Linux kernel is now compiled with -fno-delete-null-pointer-checks

Like many other large C codebases, it also uses -fno-strict-aliasing and -fno-strict-overflow (which is a synonym for "-fwrapv -fwrapv-pointer").

throwaway81523 1306 days ago

-fwrapv introduces runtime bugs on purpose! The last thing you want is an unexpected situation where n is an integer and n+1 is somehow less than n. And of course that bug has good chances of leading to UB elsewhere, such as a bad subscript. If you want to protect from UB on int overflow, -ftrapv (not -fwrapv) is the only sane approach. Then at least you'll throw an exception, similar to range checking subscripts.

It is sad that we don't get hardware assistance for that trap on any widespread cpu, at least that I know of.

lilllly 1306 days ago

I can easily test if n+1 is < n with fwrapv.

Without you have to do convoluted things like rearranging the expression to unnatural forms (move the addition to the right but invert to subtraction, etc), special case INT_MAX/INT_MIN, and so on - which you then have to hope the compiler is smart enough to optimize, which it often isn't (oh how ironic).

Asooka 1306 days ago

It's not to protect from UB, it's to protect from the optimiser deleting your bounds checks.

caf 1305 days ago

On x86 you can put an INTO instruction after each arithmetic operation to trap if the overflow flag is set.

fps_doug 1306 days ago

We've got a few components written in C that I'm (partially) responsible for. It's mostly maintenance, but for reasons like this I run that code with -O0 in production, and add all those kinds of flags.

I'd be curious to know how much production code today that's written in C is that performance critical, i.e. depends on all those bonkers exploits of UB for optimizations. The Linux kernel seems to do fine without this.

NohatCoder 1306 days ago

I'm fairly confident in declaring the answer to your question: None.

Most programs rarely issue all the instructions that a CPU can handle simultaneously, they are stuck waiting on memory or linear dependencies. An extra compile-out-able conditional typically doesn't touch memory and is off the linear dependency path, which makes it virtually free.

So the actual real-world overhead ends up at less than 1%, but in most cases something that is indistinguishable from 0.

If you care that much about 1% you are probably already writing the most performance critical parts in Assembly anyway.

sanxiyn 1306 days ago

> If you care that much about 1% you are probably already writing the most performance critical parts in Assembly anyway.

I call this hotspot fallacy and it is a common one. This assumes there is relatively small performance critical parts that can be rewritten in assembly. Yes, sometimes there is a hotspot, but by no means always. A lot of people caring about 1% is running gigabytes binary on datacenter scale computer without hotspots.

rocqua 1306 days ago

Thanks for that lwn article.

I had read it a long time ago, and had since forgotten the source. I've spent a few hours trying to find it in bug-trackers. really glad to have the link now, thanks!

matheusmoreira 1306 days ago

The C standard doesn't really matter. Standards don't compile or run code. Only thing that matters is what the compilers do. "Linux kernel C" is a vastly superior language simply because it attempts to force the compiler to define what used to be undefined.

This -fno-delete-null-pointer-checks flag is just yet another fix for insane compiler behavior and it's not the first time I've seen them do it. I've read about the Linux kernel's troube with strict aliasing and honestly I don't blame them for turning it off so they could do their type punning in peace. Wouldn't be surprised if they also had lots more flags like -fwrapv and whatnot.

kazinator 1306 days ago

I don't believe that it does. If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.

"Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Ignoring the situation completely means exactly that: completely. The situation is not being ignored completely if the compilation of something which follows is predicated upon the earlier situation being free of undefined behavior.

OK, so since the situation is not being ignored completely, and translation or execution is not terminated with a diagnostic message, it must be that this is an example of "behaving in a documented manner characteristic of the implementation". Well, what is the characteristic; where is it documented? That part of the UB definition refers to documented extensions; this doesn't look like one.

What is "characteristic of the implementation" is in fact that when you multiply two signed integers together with overflow, that you get a particular result. A predictable result characteristic of how that machine performs the multiplication. If the intent is to provide a documented, characteristics behavior, that would be the thing to document: you get the machine multiplication, like in assembly language.

hxhxhrra 1306 days ago

> I don't believe that it does. If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.

You may dislike it, but that is not how UB in C and C++ works. See [1] for a guide to UB in C/C++ that may already have been posted elsewhere here.

It is a common misconception that UB on a particular operation means "undefined result", but that is not the case. UB means there are no constraints whatsoever on the behavior of the program after UB, often referred to as "may delete all your files". See [2] for a real-world demo doing that.

[1] https://blog.regehr.org/archives/213

[2] https://kristerw.blogspot.com/2017/09/why-undefined-behavior...

kazinator 1306 days ago

I should clarify that I believed all that in 1990-something; I've arrived at a more mature professional opinion in the nearly three decades since.

MaxBarraclough 1303 days ago

> If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.

The C standard imposes no such constraint on undefined behaviour, neither is it the case that real compilers always behave as if it did.

hxhxhrra has already shown this, but here's another good blog post on this kind of thing: https://markshroyer.com/2012/06/c-both-true-and-false/

adrian_b 1306 days ago

Even if this solution cannot be used for the Linux kernel, for user programs written in C the undefined behavior should always be converted into defined behavior by using compilation options like "-fsanitize=undefined -fsanitize-undefined-trap-on-error".

charcircuit 1306 days ago

>the C standard does allow this

The C standard also allows doing it even when there is no UB. C gives implementations a ton of freedom.