Hacker News new | ask | show | jobs
by Arch-TK 1172 days ago
I agree that realloc was poorly defined for the 0 size case, I think UB or IDB both would have worked in this case to really drive that point home, the WG chose UB.

That being said, you're completely wrong about what UB means. Making use of UB may as well initiate the rise of zombie velociraptors. Except for the situation where your implementation explicitly specifies that it provides a predictable behaviour for a specific case of UB, there's literally no guarantee of what will happen. Assuming that the implementation will stick with some status quo and your code won't exhibit absolutely unusual behaviour is just naiive.

Please don't mislead people into thinking that it's ever a good idea to assume that undefined behaviour will be handled sensibly, this kind of mislead assumption is one of the major sources of bugs in C code.

3 comments

> this kind of mislead assumption is one of the major sources of bugs in C code.

This is not even close to be true. Most bugs in C code are from programmer mistakes, not from UB behavior. The exaggeration that is spread by some people regarding UB is close to absurd. If something is UB, it may generate different results in different situations, even with the same compiler. The standard is just clarifying this problem. A good compiler will do something sensible, or at least issue a warning when this situation is detected. If you have a bad compiler that does strange things with your code, it's not a defect of UB but the compiler instead.

Optimizing compilers don’t work like that. They can either deviate from the standard and leave it as defined behavior, or mark it UB and go with it as usual.

To get some insight by analogy, consider this set of constraints (unrelated to C):

  x <= 7
  2x >= 5
  …(more with x, y, z but not more constraining x)…
When you feed this to a linear constraint solver, you may get anything from 2.5 to 7 as x. E.g. 3.1415926. Not because a solver wanted to draw some circles, but because it transformed your geometric problem into an abstract representation for its own algorithm, performed some fast operations over it and returned the result. Nobody knows how exactly a specific solving method will behave wrt (underconstrained) x given that the description above is all you have.

When you feed UB into an optimizer, you feed a bit of lava into a plastic pipe, figuratively. You’ll get anything from program #2500…0000 to program #6999…9999, where “…” is few more thousands/millions of digits. Run some numbers from there as an .exe to see if something absurd happens.

The nature of UB and optimizers is that you either relax UBs into DBs and get worse efficiency, or you specify more UBs and get worse programming safety. What happens in between can be perceived as completely random. And the better/faster the optimizer is, the more random the outcome will likely be.

The exaggeration that is spread by some people regarding UB is close to absurd

UB-in-code is absurd by definition, no exaggeration here.

> Most bugs in C code are from programmer mistakes

These most often lead to the triggering of UB. The reason why programmer mistakes lead to confusing bugs instead of simple and straightforward bugs which are easy to catch in the development process is mainly because UB imposes no restrictions on what the compiler should do. In the vast majority of UB cases the compilers simply don't do anything, and assume it can't happen. This is why dereferencing a pointer and then checking if it's null ends up eliding the null check (because if you've dereferenced it, it can't be null, that would be UB). Accessing past the end of an array is UB so it can't happen, therefore your compiler won't check for it. Accessing past the end of an array and accidentally reading from/writing to another variable - likewise.

UB encompasses ALL behavior for which the standard does not provide an explicit definition. The reason why the C standard provides explicit instances of UB usually boils down to clarifying situations where people were confused about whether something was UB or not. But if the behaviour is not defined in the standard, then it is by definition UB.

If I am not wrong, one major security bug that C programs usually face is buffer overflow, which is an undefined behavior.
Right, this should have been left to the implementor if they didn't want to standardize one behavior. Making it UB is the worst possible outcome. Yes, people who write portable code will still want to not rely on `realloc()`'s freeing behavior, but if you do and your realloc() implementation doesn't, then you suffer a leak, while if you do and realloc() decides to wipe your drive and make your power supply explode...
> Except for the situation where your implementation explicitly specifies that it provides a predictable behaviour for a specific case of UB, there's literally no guarantee of what will happen.

That situation is "when you have UBSan turned on".