Hacker News new | ask | show | jobs
by somat 125 days ago
I find where the argument gets lost is when undefined behavior is assumed to be exactly that, an invariant.

That is to say, I find "could not happen" the most bizarre reading to make when optimizing around undefined behavior "whatever the machine does" makes sense, as does "we don't know". But "could not happen???" if it could not happen the spec would have said "could not happen" instead the spec does not know what will happen and so punts on the outcome, knowing full well that it will happen all the time.

The problem is that there is no optimization to make around "whatever the hardware does" or "we have no clue" so the incentive is to choose the worst possible reading "undefined behavior is incorrect code and therefore a correct program will never have it".

1 comments

Some behaviors are left unspecified instead of undefined, which allows each implementation to choose whatever behavior is convenient, such as, as you put it, whatever the hardware does. IIRC this is the case in C for modulo with both negative operands.

I would imagine that the standard writers choose one or the other depending on whether the behavior is useful for optimizations. There's also the matter that if a behavior is currently undefined, it's easy to later on make it unspecified or specified, while if a behavior is unspecified it's more difficult to make it undefined, because you don't know how much code is depending on that behavior.

But even integer overflow is undefined.

It's practically impossible to find a program without UB.

I think this is not really true. Or rather, it depends on the UB you are talking about. There is UB which is simply UB because it is out-of-scope for the C standard, and there is UB such as signed integer overflow that can cause issues. It is realistic to deal with the later, e.g. by converting them to traps with a compiler flags.
> I think this is not really true. Or rather, it depends on the UB you are talking about.

I mean, if you're going to argue that a compiler can do anything with any UB, then by all means make that argument.

Otherwise, then no, I don't think it's reasonable for a compiler to cause an infinite loop inside a function simply because that function itself doesn't return a value.

When you say "cause", do you mean insert on purpose, or do you mean cause by accident? I could see the latter happening, for example because the compiler doesn't generate a ret if the non-void function doesn't return anything, so control flow falls through to whatever code happens to be next in memory. I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.
According to the author of the second link I gave (here it is again):

https://www.quora.com/What-is-the-most-subtle-bug-you-have-h...

The problem was that the loop itself was altered, rather than that the function returned and then that somehow caused an infinite loop.

> I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.

This is where we disagree.

I am not sure what statement you are responding to. I am certainly not arguing that. I disagree with your claim that "it is practically impossible find a program without UB".
A study found that, for a particular subset of UB (code that had legal, detectable behavior changes at differing optimization levels), 40% of Debian Wheezy packages exhibited this UB.

https://people.csail.mit.edu/nickolai/papers/wang-stack.pdf

I submit that that's a small fraction of UB, that much of it would exist at any optimization level.