A macro is effectively preprocessing facilitated by the language. You could always preprosess externally if you wanted, and there's nothing stopping you from doing that in the "Powerful Language (TM)" either.
Now whether people use macros and preprocessing usefully is another question, but not one to which the answer is "abolish macros for more language features". When used correctly, macros ARE power.
Macros are useful, as long as they're used sparingly. I think that in this case, it's used well - the struct is still perfectly readable, and the sole purpose of it is to make it so that you don't have to manually name the dummy fields. But you could totally just write out dummy1, dummy2, dummy3 etc. yourself if you want to get rid of the macros.
> Macros are useful, as long as they're used sparingly.
Everybody says that. Everybody believes it. And everybody goes to town making a rat's nest with macros, just like that snarl of cables under my desk that resist all attempts to make it nice.
Myself included. I've even written an article about clever C macros. Look, ma! I was so proud of myself.
But then I got older. I started replacing the macros in my C code with regular code. It turns out they weren't that necessary at all. I liked the C code a lot better when it didn't have a single # in it other than #include.
I want to be clear about your meaning, because I don’t know if I’m reading your comment correctly. Are you referring explicitly to syntax based, preprocessor macros? Or does your comment extend to other metaprogramming techniques? I am inclined to think you mean the first considering the amount of emphasis on generic programming in D? Just curious.
I'm referring to both syntax based (AST) macros, and text based (preprocessor) macros. The latter, of course, are much worse.
An example of the former is so-called "expression templates" in C++. I've seen them used to create a regular expression language using C++ expression templates. The author was quite proud of them, and indeed they were very clever.
However nice the execution, the concept was terrible. There was no way to visually tell that some ordinary code was actually doing regular expressions.
C++ expression templates had their day in the sun, but fortunately they seem to have been thrown onto the trash pile of sounds-like-a-good-idea-but-oops.
(I wrote an article showing how to do expression templates in D, mainly to answer criticisms that D couldn't do it, not because it was a good idea.)
> Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.
> I'm referring to both syntax based (AST) macros ...
This surprises me greatly. Various lisps are among the most powerful languages I know of and a large part of the reason is macros coupled with their ability to execute arbitrary code at compile time (which itself uses additional macros, which in turn invoke more code, and so on). What's your take on this?
I'm a noob when it comes to Lithp. But I'm told what happens with their macros is the language is fairly unusable until you write lots of macros. The macros then become your personal undocumented wacky language, which nobody else is able to use.
I've seen this happen with assembler macro languages, too.
> most powerful
It's like putting a 1000 hp motor in a car. It's main use is to wreck the car and kill the driver.
BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.
I don't know what code Walter is referring to but I had similar experiences during a presentation of the Boost Spirit library (at our Silicon Valley C++ Meeting; must be about a decade ago now).
The speaker was proud and beaming for showing how powerful C++ is and the audience was in awe.
I was incredulous! Jaw open! The "solution" was horrible with a bunch of workarounds for a bunch of shortcomings. It was a "the emperor does not have cloths" moment for me.
Boost Spirit may be better today with newer C++ features; I don't know.
Oh man, you sure know how to drive a stake in my heart! What have I done! Some curses should just not be uttered. I'm not sure where it is on my backups.
But it was done just like you'd do it in C++. The same thing, just with D syntax.
Doesn’t work in a lot of cases unfortunately. If you’re writing a library designed to be consumed by other languages you’re stuck with writing C abi compatible code which can be written in other languages that can “extern” them but it puts limits on what’s possible in those libraries.
It is a terrible thing. It's possible to do without them, and you'll like your code better. Your symbolic debugger and syntax directed editor will work properly. The poor schlub who has to fix the bugs in your code after you leave will be grateful. Your spouse will be happy and your children will prosper.
For example,
#define foo(x) ((x) + 1)
replace with:
int foo(int x) { return x + 1; }
The compiler will inline it for you. There's no penalty.
#define FOO 3
Replace with:
enum { FOO = 3 };
or:
const int FOO = 3;
Replace:
#if FOO == 3
bar();
#endif
with:
if (FOO == 3) bar();
The optimizer will delete bar() if FOO is a manifest constant that is not 3.
Having made a significant contribution to Simon Tatham's PuTTY, I have to respectfully disagree. At the time the entire SSHv2 key exchange and user authentication protocols were implemented in a single function using a massive Duff's device (for asynchrony) implemented with C metaprogramming macros. It was a surprisingly pleasant experience.
I think it was RMS who argued that conditional compilation should be considered harmful in general. Code that you put in an inactive #if(def) block will not be maintained, and is basically guaranteed to rot. If it's needed in the future, it'll likely have to be rewritten from scratch.
According to this stance, any code that's suppressed by the C preprocessor should either be written in an if {} statement so that it will at least continue to compile as the surrounding code changes, or be replaced with comments describing what it does (or did), if it's important enough to keep track of.
Can't really think of many good counterarguments to this. Machine dependence might be one, but then you could argue that the preprocessor is being used to cover up for an inadequate HAL.
One of the very few things from C that I miss in C++ is anonymous structs and enums. I really don’t understand why they are not allowed.
That is, C style enums don’t have to have a name but “type safe” (enum class) ones do. One classic use is to name an otherwise boolean option in a function signature; there’s typically no need to otherwise name it.
C++ incompatibly requires a name for all struct and class declarations, again a waste when you will only have a single object of a given type.
> I really don’t understand why they are not allowed.
I don't, either. Such were in D from 2000 or so.
I also don't understand why `class` in C++ sits in the tag name space. I wrote Bjarne in the 1980s asking him to remove it from the tag name space, as the tag name space is an abomination. He replied that there was too much water under that bridge.
D doesn't have the tag name space, and in 20 years not a single person has asked for it.
This did cause some trouble for me with ImportC to support things like:
struct S { ... };
int S;
but I found a way. Although such code is an abomination. I've only seen it in the wild in system .h files.
The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.
I think C++ though is adding them.
What I'd like in c is designated function parameters.
// these the same
bar(.a = 10, .b = 12);
bar(.b = 12, .a = 10);
The main drawback is that all parameters are now optional: it will not complain if you forget to assign all parameters, it will silently set them to 0 :-/
> The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.
I don't understand. How would struct or class initialization be any different from simply doing, say, `for (auto& a : { x, y, z }) frob (a);` which is perfectly legal?
I didn't mention. I think the thought was with designated initializers the order of initialization is what? The order of the elements of the struct? Or the order where it's initialized. In C probably matters little as side effects are usually blatant. C++ I think cryptic side effects are common.
No, they are forbidden by the standard (take a look at cppreference). Some compilers implement the C behavior as an extension, so tell your compiler to follow the standard strictly.
I don’t use extensions, even convenient ones, as I have to be able to run my code on a variety of compilers. If you don’t have to do that, some extensions (like this one) are really handy.
From the standard, the enum-name is marked as optional in the enum grammar: in [dcl.enum](https://eel.is/c++draft/dcl.enum#11) ; it is also referenced e.g. in [dcl.dcl]:
> An unnamed enumeration that does not have a typedef name for linkage purposes ([dcl.typedef]) and that has a first enumerator is denoted, for linkage purposes ([basic.link]), by its underlying type and its first enumerator; such an enumeration is said to have an enumerator as a name for linkage purposes.
> A class-specifier whose class-head omits the class-head-name defines an unnamed class.
So both are entirely fine (and likewise, unions are too).
Note that my links are for the current draft, but I just checked and this was already the case as far back as C++11. So I wonder where this persistent myth seems to come from.
This is awesome. I referenced cppreference, but that is not authoritative. Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!
The part of enums you quoted was C-compatible enums; anonymous scoped enums are explicitly forbidden: "The optional enum-head-name shall not be omitted in the declaration of a scoped enumeration" (dcl.enum 2).
Sigh. I will send in a clarification at least on the class/struct/union side. Ideally the grammar would be fixed rather than that paragraph.
Each instance of "enum struct { abandon, save }" would denote a different type, yes? How would you write a compatible definition to go with your prototype?
I don’t care that they are different types; if anything that would be a feature.
The point is to prevent the “mysterious bool arguments” class of error.
The question is if ADL could infer the scope of the enum, as template instant is toon can now infer the right thing and don’t always need the <T> notation.
IMO you can still be explicit about field offsets by writing the struct in a usual way, and using static assertions to ensure offsets match the intended layout.
All undefined behaviour is well defined for each compiler, what it really means is implementation defined and subject to change without notice or documentation with every compiler version or host os what flags are enabled or a thousand other things.
Why use an approach which strictly relies on specific versions of specific compilers, rather than a completely portable and standard compliant struct with char array and a few char pointers? Or if you want a convenient interface and aren't explicitly writing for the kernel, switch to a restricted subset of C++ and do it right?
There are two kinds of undefined behaviour being invoked in using this. Its a horrible idea and a horrible code smell, get rid of it if you ever see something like this.
I don't see any undefined behavior here. As I mentioned below, gcc explicitly documents type punning via unions as being well defined. But yes, this is compiler specific and is not guaranteed to work elsewhere.
Accessing packed struct members works fine on x86, but will blow up at runtime or do weird things on platforms which don't support unaligned loads or stores.
The correct way to access packed structs is through memcpy, just like you'd access any other potentially unaligned object.
For architectures where unaligned accesses are illegal, gcc will generate multiple load/store instructions when accessing packed struct fields by name. The main caveat to look out for is taking the address of a packed struct member and then dereferencing it.
There is absolutely undefined behaviour there.
Undefined behaviour is defined not as nasal daemons but as:
The compiler implementer does not guarantee that this behaviour will be hardware, circumstance, compiler version, or os consistent, nor that we will warn if we change this.
Packed is technically not a undefined behaviour, but it is certainly a trap. Especially because the compiler macros leads people to make defines which select packed by compiler automatically. Then the special case of didn't recognize compiler is just left empty, meaning compiles but no longer does what you think.
You don't get to decide what UB means. It really does mean nasal demons are a possibility: all bets are off when you run that executable. Use of the term "undefined behaviour" to mean something else may be on the increase, unfortunately (https://mars.nasa.gov/technology/helicopter/status/298/what-...), but if we're talking about C, it's meaning is fixed.
I'm using anonymous nested structs extensively for grouping related items, but I consider the extra field name a feature, not something that should be hidden:
The result is uglier and less maintainable than a pair of macros. Or just stop trying to hide syntax. This is ultimately on the same level as typedefing pointers.
You're right; thanks for noticing and I've updated the first example. My C is a bit rusty these days and I didn't check it with a compiler the way I should have.
Doesn’t matter for C, but in C++ this could make your contexpr functions UB since you can only use one member of a union in constexpr contexts (the “active” member).
Hmm. How do C++ namespaces help with the structure naming problem in this example? They seem completely orthogonal.
C++ namespaces are a way to avoid library A's symbol "cow" clashing with library B's symbol "cow" without everything being named library_a_cow and library_b_cow all over the place which is annoying. I agree C would be nicer with such a namespace feature.
However this technique is about what happens when you realise your structure members x and y should be inside a sub-structure position, and you want both:
d = calculate_distance(s.x, s.y); // Old code
and
d = calculate_distance(s.position.x, s.position.y); // New
First of all, C++ 11 may feel like thirty years ago, and certainly some of its proponents look thirty years older than they did at the time, but it was only ten years ago. C++ namespaces date to standardisation work (so after the 1985 C++ but before the 1995 standard C++) but they don't get this job done. Inline namespaces are a newer feature.
Secondly this technique does something different. The C hack doesn't touch the old code. But this "inline namespace" trick means old code has to explicitly opt into this backward compatibility fix or else it might blow up.
Lastly, I didn't try this, but presumably you did. Are the two separately namespaces classes the "same thing" as far as type checking is concerned? A vital feature of this union trick is that it's just one structure, it type checks as the same structure because it is the same structure. At a glance, I think the C++ solution results in two types with similar names, so that would fail type checking.
Ah, another of those threads, ok lets set the years straight.
Yes, inline namespaces were only introduced in C++11, about 10 years ago, now lets dive into article.
"Learning that you can use unions in C for grouping things into namespaces"
Grouping into namespaces, so when did C++ get said feature?
ANSI/ISO C++89 released to the world in September 1998, which makes around 23 years, or 24 years if we consider the release of C++ compilers already supporting it the year before, like Borland C++.
This C hack definitly does touch old code, as it requires the code to be written to take advantage of the technique and is also touched again, when changes to the structs are required.
And naturally recompilation.
With inline namespaces, assumign recompilation you can naturally also change which set of identifiers and type aliases are visibile by default.
Based on the writeup, this technique isn't really about enabling you to start writing `s.position.x` where the old code would have written `s.x`. If that were all you wanted, you'd just keep writing `s.x`. It's about enabling you to write `s.x` everywhere, in old code and new code, while also being able to pass `s.position` to memcpy calls. You're never supposed to write `s.position.x`.
That's nice, but the blog post doesn't say anything about migrating libraries across versions? Looking at the comment thread, I see tialaramex's sibling comment suggested the blog post was about migration, but it's not.
I suppose migration is another possible use case for the union trick, and for that case C++ inline namespaces can be used as part of an implementation that achieves a broadly similar goal, but in a completely different way. As tialaramex notes, with inline namespaces you still end up with two different types.
Constexpr unions is the sane/safe way to use them. Its great, because accessing a member which isnt the last one written, constexpr will explicitly prevent it compile time. Whereas all other examples here are explicitly undefined behaviour!
This is also known as the most common invocation of undefined behaviour in game programming. If you do this, write to y, then read from [1]. You are invoking undefined behaviour, and compilers doing different things here between windows, linux mac, and different compiler versions is a common cause of "why isnt my game working right on XXX, it works fine on YYY questions.
Type punning is not undefined, it's implementation defined in C. In practice, every major C compiler will be fine with type punning, though it may disable some optimizations.
The story is different in C++, but in practice many compilers support it the same as in C. Especially for games, where VC++ (PC, Xbox) and Clang (PS4/PS5) are the most commonly used compilers, it also works as expected. The trick is to only use type punning for trivial structs that don't invoke complications like con/de-structors or operators. The GP's example of a Vec3 struct that puns float x,y,z with float[3] is a very common one in games.
Something being very common and a very common source of portability issues isn't exactly contradictory. Its a bad idea, and it is outright being taught in modern game programming courses that its a bad idea, but common in older guides, specifically because it caused so many problems. Im pissed at this specific construct because I got it handed to me in a huge game library and had to spent a long time figuring out why it wasn't working in rare, but important cases.
But my point is that on the platforms that matter, it's not really a source of portability issues, and not a problem. For gamedev, anything outside of VC++ and Clang are niche and thus largely ignored.
Op probably means cpp, where it is indeed undefined behavior. not sure about c. I doubt that if this would cause a "my game does not work on XXX" though. Is there really a compiler out there that will handle such abuse differently?
yes its undefined behaviour in both C and C++. Yes, a number of compilers treat this differently, its also poorly supported on custom hardware using standard compilers like gcc. So compiling for some mobile device with slightly custom ... good luck.
Bleurgh. I have a deep soft spot for C, and I'm known to get twisted pleasure from using obscure language features in new ways to annoy people, but this is a level of abuse that even I can't get behind. If you need namespacing, use C++. As much as I love C, it's terrible for large projects.
Linux kernel is large project and clearly C is sufficient for it, given the fact that migrating to C++ would probably be very easy (not using all C++ features, but just selected ones), yet it did not happen.
I think that C++ is better than C, but C is not that bad, even for large projects.
> Linux kernel is large project and clearly C is sufficient for it
Sure, and operating systems have been written in assmebly too. The question is whether it would be better than just sufficient if Linux were written in C++, today (ie C++17 or 20, not something old). Switching now probably wouldn't be feasible (even ignoring technical reasons, the kernel developer community is familiar with the C codebase and code standards and bought into it), but if Linux were started today, would it be a better choice?
Maybe the answer is still no and C would still be chosen, but the choice today is very different than it was when Linux was started. Of course, maybe Rust or something would be chosen today instead.
Cantrill did a talk on which he touches on C, C++ and Rust for systems programming [1].
His tl;dr being that Rust feels very much like a proper systems programming language, and more of a « better C » than C++.
I don't entirely know what to make of it, but my instinct is that something like C++ with such an opportunity space for baroque concoctions (leading to an obsession with design patterns) is just playing with fire.
Yeah they should have upgraded to some restricted subset of C++ or new restrictive language ages ago. I mostly buy the arguments against having exceptions, perhaps even against polymorphism in general, but the argument against destructors, or atomics... hell no.
C has polymorphism. Inheritance-based virtual dispatch is just one kind of polymorphism. It's common to wire up polymorphism in C with bespoke data structures using tagged unions it function pointers. Changing an implementation at link time is even a form of polymorphism.
And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.
> C++ would probably be very easy
Not necessary, besides some small? problems due to the C++ allowing "more magic optimizations" then C they would switch to a sub-set of C++, and it might be so you would need to communicate to all contributors that a lot of C++ things are not allowed. And it might be easier to simple not use C++. I mean if it would be that easy the kernel likely would have switched.
A big issue with introducing C++ into a codebase is that it's incredibly hard to stick to a particular subset or standard. There's always a well-justified argument for the next standard or "just this one additional feature". Eventually you end up with the whole kitchen sink, regardless of where you started.
I've had far more success hard-firewalling C++ into its own box where programmers can use whatever they can get running than trying to limit people to subsets.
This is probably a terrible idea, remember that if you have written one member of a union, all other members remain public, yet accessing any of them in any way is undefined behaviour. This is made way worse by most compilers mostly choosing to let you do what you think it will. They just dont guarantee they always will or in all cases.
I believe you are mistaken. The C11 standard, section 6.5.2.3 "Structure and union members" pgf 6, says "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members." And that seems to be what's being used here.
The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14). It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.
What 6.5.2.3 simplifies is the use of unions of the type:
struct A{int type; DataA a;}
struct B{int type; DataB b;}
union U{A a;B b};
U u;
switch(u.type)...
Its not what is beeing used here.
std::variant is designed to deprecate all legitimate uses of union
The post is about C, not C++. My comment stands, as the original post has two structs in a union, and they start the same way, so it’s exactly the case covered in the C11 Standard.
It's actually weirder than that. The C standard allows type punning through unions, but not because of the clause you mentioned. It allows it because of footnote 95:
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’)
This is broader than the common initial subsequence clause, and allows punning between completely different types, e.g. int, char[4], and float.
You might ask, what is the point of the "common initial subsequence" rule then? It's to allow certain accesses that don't go directly through the union, so the compiler doesn't know for sure whether there's a union involved. Only problem is that all major compilers completely ignore this rule. [1] (But they do implement the first clause I mentioned, where the accesses do go through the union.)
Your response to GP is based on the C++ reference and his explicitly is based on the C standard. Your assertion that ‘ [t]he details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14)’ seems to directly conflict with the C11 standard. Also, your closing comment about std::variant is clearly only applicable to C++. I am just curious why you are using C++ when the article and GP are specifically addressing C?
You've mentioned this several times on this page, but this is still incorrect.
The C standard references "struct or union" all over the place because the two are so similar. The distinction is of course made clear in multiple places, but one that seems relevant here is:
> As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap. (ISO/IEC 9899:201x, §6.7.2.1, #6)
That's it. There's nothing about undefined behavior if you access one member and then another later. In fact there's even a paragraph which mentions doing just that:
> The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. (ISO/IEC 9899:201x, §6.7.2.1, #16)
A pointer to the union points to each of its members, and can be dereferenced to access it.
std::variant is not used in C; C and C++ are two different languages.
> The size of a union is sufficient to contain the largest of its members.
Correct me if I'm wrong, but there is no part of the C spec that says this:
When initializing a union member that is smaller than the largest member, the remaining bytes will always automatically be initialized to zero.
If I'm right then the following caveat must be added to your statement:
> A pointer to the union points to each of its members, and can be dereferenced to access it.
... if and only if the member which was originally initialized is at least as large as the other member being accessed.
In other words, if you write your program in a way that ensures it will only compile when all union members are exactly the same size, and you have mandatory tooling to make sure that any changes to said union follow the same rule by force of compilation errors, then and only then can you claim what you claimed without the threat of undefined behavior.
Like the parent poster, when I read the article I assumed that there was no conceivable reason to ever use this feature in a real C program. Let me just say that I'm pleasantly surprised to be proven wrong!