Hacker News new | ask | show | jobs
by hermitdev 2947 days ago
Minor nitpick, the zeroing (or lack thereof) of the padding is not undefined behavior, it's unspecified behavior. Undefined behavior and unspecified behavior often look and perhaps behave the same to the programmer, but have semantic differences. In the face of undefined behavior, the compiler is allowed to do pretty much anything it wants (including formatting your hard drive and/or launching the nukes). With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.
4 comments

> With unspecified behavior, the compiler implementer must make a conscious decision on what the behavior will be and document the behavior it will follow.

No, what you described is implementation-defined behavior.

It may be confusing, but here's the breakdown of different kinds of behavior in the C standard:

* Well-defined: there is a set of semantics that is defined by the C abstract machine that every implementation must (appear to) execute exactly. Example: the result of a[b].

* Implementation-defined: the compiler has a choice of what it may implement for semantics, and it must document the choice it makes. Example: the size (in bits and chars) of 'int', the signedness of 'char'.

* Unspecified: the compiler has a choice of what it may implement for semantics, but the compiler is not required to document the choice, nor is it required to make the same choice in all circumstances. Example: the order of evaluation of a + b.

* Undefined: the compiler is not required to maintain any observable semantics of a program that executes undefined behavior (key point: undefined behavior is a dynamic property related to an execution trace, not a static property of the source code). Example: dereferencing a null pointer.

Nice comment! Here are the excerpts from n1570.pdf[1] with some punctuation added by me to compensate for the limited formatting support on this forum:

§3.4.0: behavior: external appearance or action

§3.4.1: implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made. EXAMPLE: An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.

§3.4.2: locale-specific behavior: behavior that depends on local conventions of nationality, culture, and language that each implementation documents. EXAMPLE: An example of locale-specific behavior is whether the islower function returns true for characters other than the 26 lowercase Latin letters.

§3.4.3: undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. NOTE: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). EXAMPLE: An example of undefined behavior is the behavior on integer overflow.

§3.4.4: unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance. EXAMPLE: An example of unspecified behavior is the order in which the arguments to a function are evaluated.

[1]: WG14 working paper for the C11 standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Thank you kind sir, you win the prize for most coherent and informative technical snippet in the world. On this day anyway.
Accessing uninitialised memory is undefined. As in Nasal Demon undefined. If you don't zero out the padding, I'm sure there's a clever way to access those bytes in ways that invoke undefined behaviour.
The padding has unspecified values, which is distinct from being uninitialised.

If it were otherwise, you couldn't `memcpy()` structures around.

I believe there's an exception for the likes of `memcpy()`. Something along the lines of "type punning and reading indeterminate values is undefined except when we're reading through a `char*` or something.

I'll check the unbelievably thick book that tries to specify C11 (I've printed it, it's over 2 pounds).

The only way to access the padding without otherwise falling into undefined behaviour is by using a char * anyway (including indirectly using a char *, through memcpy()).
You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Why not just have the compiler zero the memory, and thereby remove the trap? Seems very sensible to me. Do you think it’s a bad idea, and if so, why?

There is a concern for performance. But that's no reason. Zero initing could be default behavior that can be declared away. E.g.

As a type qualifier keyword:

  { int x, y; /* x and y are zero */ }

  { int noinit x, y; /* x is indeterminate, y is zero */ }
Or as a declaration specifier:

  { noinit x, y; /* both x, y indeterminately-valued */ }
Or a special constant for suppressing zero initialization:

  { int x, y = noinit; /* x zero, y indeterminate */ }
Similarly, unspecified order of evaluation could be supported by explicit request:

  decl (unspec_order) { /* comma-separated list of decl items */
     a[i] = i++; /* UB */
  }

  a[i] = i++; /* well-defined */
zero initing is default behavior for static values or structs.

    static int x, y; /* x and y are zero */
Good idea!
> You say that as if it’s entirely the responsibility of the programmer to avoid these bear traps that have been left lying around everywhere.

Oh no no no, I was picking on hermitdev's characterisation of the behaviour. Sure, what the compiler does with the padding bytes is unspecified. But it can still lead to undefined behaviour, if some unwary user ever reads them. A misinterpretation like that, and poof you get a security vulnerability. The C standard is insane.

Aha, sorry, I misunderstood! Glad to hear you’re on the side of sanity. :)
It's more than a logical or performance issue, it's a cultural one. In C culture, there's a sense that the programmer has direct control over the hardware. They can literally write or read to any memory address as they see fit, and have very fine-grained control over what the machine is doing.

Modern processors, with their out-of-order execution, complex caching algorithms, deep pipeliens and multi-threaded hardware often render this sense of control more illusory than factual, but the C culture remains wedded to the idea that the programmer is in control.

Accordingly, it's pretty provocative to suggest that a compiler or runtime would zero out memory without you specifically saying so. Any C programmer can overload malloc (doesn't zero memory) to calloc (which does). Whether it's a good or bad idea to do so is up to the programmer. The overall idea is don't do anything unless I say so.

Sane compilers should do that. The standard should eventually specify that. But before it does, you can not write portable code that expect that (but hopefully once enough compilers are sane but before the standard is updated, you can write code that targets only the compilers, and don't give a fuck about the other broken garbage that try to trap the world)
if you reference a null pointer then this behavior is 'undefined' from the perspective of the C compiler - however it is very well defined by the operating system (if you are doing a user space application).

No nukes get involved here - only core dumps.

If you are working on top of an operating system that launches nukes upon null pointer access, then you should consider to switch vendors.
> and document the behavior it will follow.

So it's unspecified in terms of the standard, but specified by the implementation

Right! Basically it's up to the compiler programmer to pick a path and follow it... assuming you're talking about implementation-defined.