Hacker News new | ask | show | jobs
by gpderetta 531 days ago
The union trick is actually defined in C.

And note that while char can alias anything, the reverse is not true: i.e. you can't generally cast a char array to anything else and expect sensible behaviour. There are ways to make this work (placement new in C++ for example), but it is not a way to escape TBAA: if you store a float in char array you can't then cast it to int with impunity.

1 comments

To be more precise, it is defined since c99[0]. In c89 it was undefined, but type punning is the most used/sensible behaviour, so they changed it in c99.

[0]: https://en.cppreference.com/w/c/language/union

That is a common misconception. DR 283 is a suggestion for an amendment that was filed 3 years after C99 was published:

https://open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

It is not part of C99. It also is not part of the C standard since no subsequent C standard adopted it according to the GCC developers:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13

A read of the C11 standard draft, which would have this amendment if it were accepted by the C standards committee, shows that this has not been added:

https://open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Type punning via union types is therefore undefined behavior unless your compiler implements an extension to define it like GCC and Clang do.

Hum, from your wg14 link: 6.5.2.3 comma 3 and note 95. I thought that was the note that was added on TC3.

Also the note is non-normative, so it is only clarifying preexisting behaviour.

But I'm far from an expert on the C standard. Also that was the C11 draft, maybe the note was removed before the final standard.

Edit: I believe the alias rules are in 6.5 comma 7; specifically:

> An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

[...]

>an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),

Edit2: neither commas nor the note have changed in the 202y Draft.

You need more language in order to say that type punning is allowed. Implicitly, only the type of the last write is permitted reads, and anything else is undefined behavior. At least, this is my understanding based on my own read and the guidance from the GCC developers.
From a cursory search I can't find any languages in the C standard that disallow reading not from the last written member.

I'm familiar with such language in the C++ standard.

Edit: On the contrary, this note 92:

> 92)If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This can possibly be a non-value representation.

Edit2: and that's specifically the text that was added by dr283. I think you might be confusing with a different DR (don't remember the number) that specifically asked if generalized type punning was possible as long as an union containing the aliased types was visible in the translation unit. I think that's still open although GCC definitely forbids it.

Where did you find that note? I do not see it in the C standard draft I linked.
Oh that's interesting. I guess I should actually look at the standard instead of taking cppreference's word for it next time
Yes, it 2025, I thought that we could at least imply C99 when talking about plain C :).

I'm probably an optimist.

It does not matter. The C99 standard does not define this behavior:

https://news.ycombinator.com/item?id=42568271m