Hacker News new | ask | show | jobs
by tpush 1306 days ago
> If a situation has been statically determined to invoke UB that should be a compile time error.

That's simply not how the compiler works.

There is (presumably, I haven't actually looked) no boolean function in GCC called is_undefined_behavior(). It's just that each optimization part of the compiler can (and does) assume that UB doesn't happen, and results like the article's are then essentially emergent behavior.

See also: https://blog.regehr.org/archives/213

2 comments

C++ bans undefined behavior in constexpr, so you can force GCC to prove that code has no undefined behavior by sprinkling it in declarations where applicable:

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...

Constant-evaluated expressions with undefined behavior are ill-formed but constexpr annotated functions which may in some invocations result in undefined behavior are not.
It is undefined behaviour if I write GCC --hlep

Does that mean it's acceptable for GCC to reformat my hard drive?

Just because something is UD doesn't give anyone a license to do crazy things.

If I misspell --help I expect the program to do something reasonable. If I invoke UD I still expect the program to do something reasonable.

Removing checks for an overflow because overflows 'can't happen' is just crazy.

UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself. A compiler undermining that is a bug and user hostile.

> It is undefined behaviour if I write GCC --hlep

No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations.

> UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself.

I think you're confusing UB with unspecified and implementation defined behavior. It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

This has come up before, because, in some technical sense, the C standard does indeed not define what a "gcc" is, so "gcc --help" is undefined behavior according to the C standard, because the C standard does not define the behavior. By the same token, instrument flight rules are undefined behavior.

A slightly less textualist approach to language recognizes that when we talk about C and UB, we mean behavior, which is undefined, of operations otherwise defined by the C standard.

I think this is confusing undefined behavior with behavior of something that is undefined. And either way, the C standard explicitly applies to C programs, so even this cute "textualist" interpretation would be wrong, IMO.
Do you know what a metaphor is?

No GCC --hlep isn't in the c standard.

But it is a simple example to illustrate how programs react when it receives something that isn't in the spec. GCC could do anything with Gcc --hlep just like it could do anything with INT_MAX + 1. That doesnt mean that all options open to it are reasonable.

If I typed in GCC --hlep I would be reasonably pissed that it deleted my hard drive. You pointing out that GCC never made any claims about what would happen if I did that doesn't make it ok.

If you come across UD, there's reasonable and unreasonable ways to deal with that. Reformatting your hard drive which is presumably allowed by the C standard isn't reasonable. I would contend that removing checks is also unreasonable.

> I would contend that removing checks is also unreasonable.

Yeah, but removing a null check after a dereference has a solid rationale, so it’s very different from GCC taking it upon itself to format your drive.

The general thinking seems to be that UB can do anything so you can't complain, whatever that anything is.

That would logically include reformatting your hard drive.

I definitely disagree with that pov, if you don't accept that UB can result in anything then the line needs to be drawn somewhere.

I would contend that UB stems from the hardware. C won't take responsibility for what the hardware does. Neither will it step in to change what the hardware does. That in turn means that UB means the compiler shouldn't optimise because the behaviour is undefined.

>No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations

What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

>I think you're confusing UB with unspecified and implementation defined behavior

Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

>It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

C gives you enough rope to hang yourself. It isn't required for GCC to tie the noose and stick your head in it though.

I think you're confusing UB with unspecified and implementation defined behavior

> What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

I honestly don't understand the point of this paragraph.

> Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

Yes, you are confused about that. UB is precisely the kind of behavior where the C standard deemed it unsuitable to define as implementation defined or whatever, and it usually has really good reasons to do so. You could look them up instead of asking rhetorically.

> I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen. The ability to do that is fundamentally the point of UB. Situations like in the article are not a specific act of the compiler to screw you in particular, but an emergent result.

Additionally, I think you you're also confusing Undefined Behavior with 'behavior of something that is undefined'. These are not the same things.

>Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen

Which is as wrong as coding GCC to assume --hlep can't happen.

It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

If you don't understand my --hlep example how about: Int mian () {

What should the compiler do there? Same rules apply should it reformat your hard drive or warn you that it can't find such a function? There are reasonable and unreasonable ways to deal with behaviour that hasn't been defined.

If I put in INT_MAX + 1 it isn't reasonable to reformat my hard drive. The compiler doesn't have carte blanche to do what it likes just because it's UD. It should be doing something reasonable. To me removing an overflow check isn't reasonable.

If you want to have a debate about what is reasonable we can have that debate but if you're going to say UB means anything tlcan happen then I'm just going to ask why it shouldn't reformat your hard drive.

Again, you still don't understand.

> It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

A compiler's handling of UB simply can't work the same way handling flag passing works in GCC. Fundamentally.

With GCC, the example is something like:

  if (strcmp(argv[1], "--help") == 0) { /* do help */ } else { /* handle it not being help, for example 'hlep' or whatever */ }

Here, GCC can precisely control what happens when you pass 'hlep'.

Compilers don't and can't work this way. There is no 'if (is_undefined_behavior(ast)) { /screw the user / }'. UB is a property of an execution, i.e. what happens at runtime, and can't _generally_ be detected at compile time. And you very probably do not want checks for every operation that can result in UB at runtime! (But if you do, that's what UBSan is!).

So, the only way to handle UB is either

1) Leaving the semantics of those situation undefined (== not occuring), and coding the transformation passes (so also opt passes) that way.

or

2) Defining some semantics for those cases.

But 2) is just implementation defined behavior! And that is what you're arguing for here. You want signed integer overflow to be unspecified or implementation defined behavior. That's fine, but a job for the committee.

> It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm glad I don't live in your country, where the C standard has been incorporated into law, making it illegal for compiler writers to do things that are helpful to programmers and end users, but aren't required by the standard.

> UD is supposed to allow C to be implemented on different architectures

No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures. In those cases, the implementation must define the behavior itself, and stick with it. UB, on the other hand, exists for compiler authors to optimize.

If you want to be mad at someone, be mad at the C standard for defining so much stuff as UB instead of implementation-defined behavior. Integer overflow should really be implementation-defined instead.

>No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures.

Is it? We're talking about integer overflow here.

I wasn't in the meetings when writing all the c standards. I'm not convinced this is purely an optimisation thing though.

I would guess the story is more.

Interested party X: "can integer overflow do X?"

Party Y: "no because our processor doesn't work like that.

Party Z: "and it breaks K and R"

Party X: "how about implementation defined?"

Party A: "but our compiler targets 5 different processors"

Party B: "plus that precludes certain optimisations"

Not only to optimize but to write safety tools. If you defined all the behavior, and then someone used some rare behavior like integer overflow by accident, it'd be harder to detect that since you have to assume it was intentional.
> UB, on the other hand, exists for compiler authors to optimize.

Was this really the original reason why there's UB in the C standard, or has this been retconned by 'malicious compiler authors'? ;)

It is the original reason. For example, register allocation is possible because stack smashing is UB.
UB is also very much based around software incompatibilities though, not just the ability to optimise stuff.

But where IB can have useful definitions to document, UB was defined so because the behaviours were considered sufficiently divergent that allowing them was useless, and so it was much easier to just forbid them all.

But then again, UB doesn't mean the compiler author can't treat it as implementation-defined and do something reasonable.
You're getting it backward. UB doesn't immediately stop compilation only due to implementation defined backward compatibility, just because you don't want to break compilation of existing programs each time the compiler converges to the C spec and identified an implementation of undefined behavior.

And since you want some cross-compiler compatibility, you also import's third parties implementation defined UB.

This is not some conceptual reasonable decision, the proper way would be to throw out compilation on each UB behavior. The reality is that the proper way would be too harsh on existing codebase, making people use a less strict compiler or not updating version, which are non-desirable effects for compilers writers.

I can't really follow. What would be wrong with making -fwrapv the default? i.e. let the compiler assume signed integers are two's complement on according platforms (i.e. virtually everything in use today). Then stop assuming "a + 1 < a" cannot be true for signed ints. How would that make existing code worse, or break it? It's basically what you already get with -O0 afaict, so any such program would be broken with optimizations turned off.
There is nothing wrong. Except that a subset of GCC users prefer -ftrapv and another subset wants no overhead, so the status quo remains.
I think I misunderstood your comment, sorry, but I have difficulties in understanding how it's different that how things works already, then. You either have to rely that the compiler author did chose what you expect (not the case here), or check by yourself and hope it won't change.
And sanitizers that throw warnings on undefined behavior do indeed exist.
>UB, on the other hand, exists for compiler authors to optimize.

s/exists for/has been exploited by/g

The worst part is the optimizations aren't even that significant. (I recall a blog post of somebody testing this but I can't find it rn)

It is undefined behaviour if I write GCC --hlep

Well no, it's a compilation error, you need at the very least a semicolon after hlep and from there on it depends on what GCC is. If it's a function you need parentheses around --hlep, if it's a type you need to remove the --, if it's a variable you need to put a semicolon after it,...

Because GCC is all-caps I'm guessing it's a macro, so here's an example of how you could write it (though it won't be UB): https://godbolt.org/z/dYMddrTjj

I'm not sure if you're supporting my pov by showing the absurdity of the other position???

Yeah sure, if my phone auto incorrects gcc to GCC then that is technically meaningless so you're completely free to interpret my comment how you want.

..... Although..... GCC stands for GNU Compiler Collection so it can be reasonably capitalised, so maybe then, rather than saying anything goes we should do something reasonable because then you aren't left saying something really stupid if you're wrong???

Parent point is when the standard talks about UB it refers about translating C code. So parent cheekly interpreted your comment about command line flags (which are outside the remit of the standard) as code instead. I thought it was fitting.