Hacker News new | ask | show | jobs
by pansa2 411 days ago
> Any access of memory using a union, where the union includes the effective type of the memory is legal. Consider:

    union {
        int i;
        float f;
    } *u;
    float f = 3.14;
    u = &f;
    x = u->i;
> In this case the memory pointed to by “u” has the declared effective type of int, and given that “u” is a union that contains int, the access using the “i” member is legal. It’s noteworthy in this that the “f” member of the union is never used, but only there to satisfy the requirement of having a member with a type compatible with the effective type.

Is this a typo? Should it say "declared effective type of float" and "“u” is a union that contains float"?

It's interesting to see type-punning using a union - I've read that it should be avoided and to use `memcpy` instead. Are there any issues with the union approach in C? Or is the advice to prefer `memcpy` specific to C++, where AFAICT the union approach is undefined behaviour?

1 comments

> type-punning using a union - I've read that it should be avoided and to use `memcpy` instead

The other day we had standard committee members confirming union punning is good in C: https://news.ycombinator.com/item?id=43793225

Looks to me like union-based type-punning in C is indeed "better than C++" (in C++ it's just plain undefined behavior). In C, it looks like the behavior is defined unless you hit a trap representation.

https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p5

> Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...] Such a representation is called a trap representation.

https://port70.net/~nsz/c/c11/n1570.html#6.5.2.3p3

> A postfix expression followed by the `.` operator and an identifier designates a member of a structure or union object. The value is that of the named member. [Footnote: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.]

I'm fuzzy on exactly what a "trap representation" might be in real life. I have the impression that a signaling NaN isn't. I suspect that a visibly invalid pointer value on a CHERI-like or ARM64e-like platform might be. Anyway, my impression is that sane platforms don't have trap representations, so indeed, you have to go out of your way to contrive a situation where C's paper standard would not define type-punning (whether union-based or pointer-cast-based) to have the "common-sense" physical behavior.

Again this is different from C++, where both union-based type-punning and pointer-cast-based type-punning have UB, full stop:

https://eel.is/c++draft/expr.prop#basic.lval-11

> An object of dynamic type Tobj is _type-accessible_ through a glvalue of type Tref if Tref is similar to Tobj, a type that is the signed or unsigned type corresponding to Tobj, or a char, unsigned char, or `std::byte` type.

> If a program attempts to access the stored value of an object through a glvalue through which it is not type-accessible, the behavior is undefined.

Thanks, I didn’t see that discussion at the time.