Hacker News new | ask | show | jobs
by benj111 1306 days ago
>No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations

What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

>I think you're confusing UB with unspecified and implementation defined behavior

Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

>It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.

I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

C gives you enough rope to hang yourself. It isn't required for GCC to tie the noose and stick your head in it though.

I think you're confusing UB with unspecified and implementation defined behavior

1 comments

> What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.

I honestly don't understand the point of this paragraph.

> Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.

Yes, you are confused about that. UB is precisely the kind of behavior where the C standard deemed it unsuitable to define as implementation defined or whatever, and it usually has really good reasons to do so. You could look them up instead of asking rhetorically.

> I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.

Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen. The ability to do that is fundamentally the point of UB. Situations like in the article are not a specific act of the compiler to screw you in particular, but an emergent result.

Additionally, I think you you're also confusing Undefined Behavior with 'behavior of something that is undefined'. These are not the same things.

>Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen

Which is as wrong as coding GCC to assume --hlep can't happen.

It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

If you don't understand my --hlep example how about: Int mian () {

What should the compiler do there? Same rules apply should it reformat your hard drive or warn you that it can't find such a function? There are reasonable and unreasonable ways to deal with behaviour that hasn't been defined.

If I put in INT_MAX + 1 it isn't reasonable to reformat my hard drive. The compiler doesn't have carte blanche to do what it likes just because it's UD. It should be doing something reasonable. To me removing an overflow check isn't reasonable.

If you want to have a debate about what is reasonable we can have that debate but if you're going to say UB means anything tlcan happen then I'm just going to ask why it shouldn't reformat your hard drive.

Again, you still don't understand.

> It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.

A compiler's handling of UB simply can't work the same way handling flag passing works in GCC. Fundamentally.

With GCC, the example is something like:

  if (strcmp(argv[1], "--help") == 0) { /* do help */ } else { /* handle it not being help, for example 'hlep' or whatever */ }

Here, GCC can precisely control what happens when you pass 'hlep'.

Compilers don't and can't work this way. There is no 'if (is_undefined_behavior(ast)) { /screw the user / }'. UB is a property of an execution, i.e. what happens at runtime, and can't _generally_ be detected at compile time. And you very probably do not want checks for every operation that can result in UB at runtime! (But if you do, that's what UBSan is!).

So, the only way to handle UB is either

1) Leaving the semantics of those situation undefined (== not occuring), and coding the transformation passes (so also opt passes) that way.

or

2) Defining some semantics for those cases.

But 2) is just implementation defined behavior! And that is what you're arguing for here. You want signed integer overflow to be unspecified or implementation defined behavior. That's fine, but a job for the committee.

I get what's happening.

It's basically dead code removal. X supposedly can't happen so you never need to check for X.

The instance in the article is about checking for an overflow. The author was handling the situation. C handed him the rope, he used the rope sensibly checking for overflow. GCC took the rope and wrapped it around his neck. Fine GCC (and C) can't detect overflow at compile time and doesn't want to get involved in runtime checks. Leave it to the user then. But GCC isn't leaving it to the user it's undermining the user.

Re 2) (are you referring to gccs committee or the c committee?)

I don't mind what it's deemed to be, I expect GCC to do something reasonable with it. Whatever happens a behavior needs to be decided by someone. Some of those behaviours are reasonable some aren't. If you're doing a check for UB, the reasonable thing, to me is to maintain that check.

I could make a choice when I write an app to assume that user input never exceeds 100 bytes. I could document it saying anything could happen, then reasonably (well many people would disagree) leave it there, that is my choice.

If you come along and put 101bytes of input in you would complain if my app then reformatted your hard drive. Wouldn't you also complain if GCC did the same?

There's atleast a post a week complaining about user hostile practices with regard to apps. Why do compiler writers get a free pass? If I put up code assuming user input would be less than 100 bytes documented or not, someone would raise that as an issue so why the double standard.

I'm not even advocating the equavalent of safe user input. I'm advocating that just because you go outside the bounds of what is defined, you do something reasonable.

> If you're doing a check for UB, the reasonable thing, to me is to maintain that check.

The problem is that you need to do the check before you cause UB, not after, and here the check appears after. If you do the check before, the compiler will not touch it.

The compiler can't know that this code is part of a UB check (so it should leave it alone), whereas this other code here isn't a UB check but is just computation (so it should assume no UB and optimise it). It just optimises everything, and assumes you don't cause UB anywhere.

Now, I'm not defending this approach, but C works like this for performance and portability reasons. There are modern alternatives that give you most or all of the performance without all these traps.

>for performance and portability reasons

Is it more performant?

How would you do the check in the article in a more performant way?

Philosophically I'm not sure it's even possible. Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out? Yes you can make it unrelated enough that the compiler doesn't realise. But really if the compiler can always assume you aren't going to overflow integers, then it should be able to optimise away 'stupid' questions like 'if I add X and y, would that be an overflow?'.

>The compiler can't know that this code is part of a UB check

If it doesn't know what the code is then it shouldn't be deleting it. It has just rearranged code that it knows is UB, it is now faced with a check on that UB. It could (and does) decide that can't possibly happen, because 'UB'. It could instead decide that it is UB and so doesn't know if this check is meaningful or not, and not delete the check, this to me is the original point of UB, C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does. The performance of c as I understood it is that overflow check is optional, you aren't forced to check. But you are required to ensure that the check is done if needed, or deal with the consequences.

Would you get rid of something you don't understand because you can't see it doing something useful. Or would you keep it because you don't know what you might break when you delete it? GCC in this case is deleting something it doesn't understand. Why is that not a bug?