Learning that you can use unions in C for grouping things into namespaces

Y	Hacker News new \| ask \| show \| jobs

	Learning that you can use unions in C for grouping things into namespaces (utcc.utoronto.ca)
	167 points by deafcalculus 1826 days ago

11 comments

10000truths 1826 days ago

Anonymous nested structs are also quite useful for creating struct fields with explicit offsets:

    #include <stdio.h>
    #include <stdint.h>
    
    #define YDUMMY(suffix, size) char dummy##suffix[size]
    #define XDUMMY(suffix, size) YDUMMY(suffix, size)
    #define PAD(size) XDUMMY(__COUNTER__, size)
    
    struct ExplicitLayoutStruct {
        union {
            struct __attribute__((packed)) { PAD(3); uint32_t foo; };
            struct __attribute__((packed)) { PAD(5); uint16_t bar; };
            struct __attribute__((packed)) { PAD(13); uint64_t baz; };
        };
    };
    
    int main(void) {
        // offset foo = 3
        // offset bar = 5
        // offset baz = 13
        printf("offset foo = %d\n", offsetof(struct ExplicitLayoutStruct, foo));
        printf("offset bar = %d\n", offsetof(struct ExplicitLayoutStruct, bar));
        printf("offset baz = %d\n", offsetof(struct ExplicitLayoutStruct, baz));
        return 0;
    }

link

WalterBright 1825 days ago

Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.

link

tpoacher 1825 days ago

I hear your quote, but it's only a quote.

A macro is effectively preprocessing facilitated by the language. You could always preprosess externally if you wanted, and there's nothing stopping you from doing that in the "Powerful Language (TM)" either.

Now whether people use macros and preprocessing usefully is another question, but not one to which the answer is "abolish macros for more language features". When used correctly, macros ARE power.

link

10000truths 1825 days ago

Macros are useful, as long as they're used sparingly. I think that in this case, it's used well - the struct is still perfectly readable, and the sole purpose of it is to make it so that you don't have to manually name the dummy fields. But you could totally just write out dummy1, dummy2, dummy3 etc. yourself if you want to get rid of the macros.

link

WalterBright 1825 days ago

> Macros are useful, as long as they're used sparingly.

Everybody says that. Everybody believes it. And everybody goes to town making a rat's nest with macros, just like that snarl of cables under my desk that resist all attempts to make it nice.

Myself included. I've even written an article about clever C macros. Look, ma! I was so proud of myself.

But then I got older. I started replacing the macros in my C code with regular code. It turns out they weren't that necessary at all. I liked the C code a lot better when it didn't have a single # in it other than #include.

link

throwaway17_17 1825 days ago

I want to be clear about your meaning, because I don’t know if I’m reading your comment correctly. Are you referring explicitly to syntax based, preprocessor macros? Or does your comment extend to other metaprogramming techniques? I am inclined to think you mean the first considering the amount of emphasis on generic programming in D? Just curious.

link

WalterBright 1825 days ago

I'm referring to both syntax based (AST) macros, and text based (preprocessor) macros. The latter, of course, are much worse.

An example of the former is so-called "expression templates" in C++. I've seen them used to create a regular expression language using C++ expression templates. The author was quite proud of them, and indeed they were very clever.

However nice the execution, the concept was terrible. There was no way to visually tell that some ordinary code was actually doing regular expressions.

C++ expression templates had their day in the sun, but fortunately they seem to have been thrown onto the trash pile of sounds-like-a-good-idea-but-oops.

(I wrote an article showing how to do expression templates in D, mainly to answer criticisms that D couldn't do it, not because it was a good idea.)

link

d110af5ccf 1825 days ago

> Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.

> I'm referring to both syntax based (AST) macros ...

This surprises me greatly. Various lisps are among the most powerful languages I know of and a large part of the reason is macros coupled with their ability to execute arbitrary code at compile time (which itself uses additional macros, which in turn invoke more code, and so on). What's your take on this?

(Continuations are also pretty nice ...)

link

WalterBright 1825 days ago

I'm a noob when it comes to Lithp. But I'm told what happens with their macros is the language is fairly unusable until you write lots of macros. The macros then become your personal undocumented wacky language, which nobody else is able to use.

I've seen this happen with assembler macro languages, too.

> most powerful

It's like putting a 1000 hp motor in a car. It's main use is to wreck the car and kill the driver.

BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.

link

shaklee3 1825 days ago

Can you link to it? We're using expression templates on a new library and I find it useful.

link

acehreli 1824 days ago

I don't know what code Walter is referring to but I had similar experiences during a presentation of the Boost Spirit library (at our Silicon Valley C++ Meeting; must be about a decade ago now).

The speaker was proud and beaming for showing how powerful C++ is and the audience was in awe.

I was incredulous! Jaw open! The "solution" was horrible with a bunch of workarounds for a bunch of shortcomings. It was a "the emperor does not have cloths" moment for me.

Boost Spirit may be better today with newer C++ features; I don't know.

link

WalterBright 1825 days ago

Oh man, you sure know how to drive a stake in my heart! What have I done! Some curses should just not be uttered. I'm not sure where it is on my backups.

But it was done just like you'd do it in C++. The same thing, just with D syntax.

link

Spivak 1825 days ago

Doesn’t work in a lot of cases unfortunately. If you’re writing a library designed to be consumed by other languages you’re stuck with writing C abi compatible code which can be written in other languages that can “extern” them but it puts limits on what’s possible in those libraries.

link

WalterBright 1825 days ago

One thing I do is to make my own interface to the 3rd party library with the ugly interface. Then that interface file is only one that sees it.

It also may be true that one can't replace every use of the preprocessor. That shouldn't stop replacing what one can.

link

cryptonector 1825 days ago

You might as well have written that "any time you're reaching for C, it's time to reach for a more powerful language".

But if -sadly- you must use C, metaprogramming using macros is not a terrible thing.

link

WalterBright 1825 days ago

> is not a terrible thing

It is a terrible thing. It's possible to do without them, and you'll like your code better. Your symbolic debugger and syntax directed editor will work properly. The poor schlub who has to fix the bugs in your code after you leave will be grateful. Your spouse will be happy and your children will prosper.

For example,

    #define foo(x) ((x) + 1)

replace with:

    int foo(int x) { return x + 1; }

The compiler will inline it for you. There's no penalty.

    #define FOO 3

Replace with:

    enum { FOO = 3 };

or:

    const int FOO = 3;

Replace:

    #if FOO == 3
    bar();
    #endif

with:

    if (FOO == 3) bar();

The optimizer will delete bar() if FOO is a manifest constant that is not 3.

link

cryptonector 1825 days ago

Having made a significant contribution to Simon Tatham's PuTTY, I have to respectfully disagree. At the time the entire SSHv2 key exchange and user authentication protocols were implemented in a single function using a massive Duff's device (for asynchrony) implemented with C metaprogramming macros. It was a surprisingly pleasant experience.

link

pjmlp 1825 days ago

There is a reason why such clevernesses are forbiden by security standards like MISRA-C.

link

zerr 1825 days ago

Agree except the last example - the clarity of intent that you want a conditional compilation is lost. Also, non-optimized debug builds are affected.

link

CamperBob2 1825 days ago

I think it was RMS who argued that conditional compilation should be considered harmful in general. Code that you put in an inactive #if(def) block will not be maintained, and is basically guaranteed to rot. If it's needed in the future, it'll likely have to be rewritten from scratch.

According to this stance, any code that's suppressed by the C preprocessor should either be written in an if {} statement so that it will at least continue to compile as the surrounding code changes, or be replaced with comments describing what it does (or did), if it's important enough to keep track of.

Can't really think of many good counterarguments to this. Machine dependence might be one, but then you could argue that the preprocessor is being used to cover up for an inadequate HAL.

link

WalterBright 1825 days ago

> clarity of intent

Conditional compilation is an optimization, not a semantic intent.

> non-optimized debug builds are affected

The code size will be larger. Doesn't matter.

link

shultays 1825 days ago

   The optimizer will delete bar() if FOO is a manifest constant that is not 3.

Yes, but the compiler won't delete it and complain if bar is not defined. if constexpr is not a direct replacement for if macro

link

WalterBright 1825 days ago

> the compiler won't delete it and complain if bar is not defined

    extern void bar();

will define it to the satisfaction of the compiler. The linker won't complain if the compiler removes the call to it.

link

gumby 1825 days ago

One of the very few things from C that I miss in C++ is anonymous structs and enums. I really don’t understand why they are not allowed.

That is, C style enums don’t have to have a name but “type safe” (enum class) ones do. One classic use is to name an otherwise boolean option in a function signature; there’s typically no need to otherwise name it.

C++ incompatibly requires a name for all struct and class declarations, again a waste when you will only have a single object of a given type.

link

WalterBright 1825 days ago

> I really don’t understand why they are not allowed.

I don't, either. Such were in D from 2000 or so.

I also don't understand why `class` in C++ sits in the tag name space. I wrote Bjarne in the 1980s asking him to remove it from the tag name space, as the tag name space is an abomination. He replied that there was too much water under that bridge.

D doesn't have the tag name space, and in 20 years not a single person has asked for it.

This did cause some trouble for me with ImportC to support things like:

    struct S { ... };
    int S;

but I found a way. Although such code is an abomination. I've only seen it in the wild in system .h files.

link

Gibbon1 1825 days ago

The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.

I think C++ though is adding them.

What I'd like in c is designated function parameters.

  // these the same
  bar(.a = 10, .b = 12);
  bar(.b = 12, .a = 10);

link

saagarjha 1825 days ago

I suspect that to many C++ programmers, most initializations of structs have unpredictable side effects because of how complex they are ;)

link

wott 1825 days ago

You can somewhat fake it by replacing your functions parameters list with a single struct parameter.

    struct bar_arguments {
       int a, b;
    };
    int bar(struct bar_arguments args) { return 2*args.a + args.b;}

    #define bar(...) bar((struct bar_arguments) {__VA_ARGS__})

    // usage (will print 32 three times)
    printf("%d\n", bar(10, 12));
    printf("%d\n", bar(.a = 10, .b = 12));
    printf("%d\n", bar(.b = 12, .a = 10));

The main drawback is that all parameters are now optional: it will not complain if you forget to assign all parameters, it will silently set them to 0 :-/

    printf("%d\n", bar(10));
    printf("%d\n", bar(.a = 10));
    printf("%d\n", bar(.b = 12));

will print 20, 20 and 12.

You can change those "default values", but then calling the function with regular positional parameters is impaired :-/

link

gumby 1825 days ago

> The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.

I don't understand. How would struct or class initialization be any different from simply doing, say, `for (auto& a : { x, y, z }) frob (a);` which is perfectly legal?

link

Gibbon1 1823 days ago

I didn't mention. I think the thought was with designated initializers the order of initialization is what? The order of the elements of the struct? Or the order where it's initialized. In C probably matters little as side effects are usually blatant. C++ I think cryptic side effects are common.

link

cjaybo 1825 days ago

> C++ incompatibly requires a name for all struct and class declarations

You're right about "enum class", but anonymous classes and structs are perfectly valid in C++:

https://godbolt.org/z/7MbcqhnoK

link

dataflow 1825 days ago

Try

  struct S { struct { int x; }; };

under -pedantic and you'll get

  warning: ISO C++ prohibits anonymous structs [-Wpedantic]

link

midjji 1825 days ago

Pedantic is for the older C++ standard, its not pedantic for the latter e.g c++11, I think this changed.

link

junon 1825 days ago

No, pedantic is for disabling compiler extensions. You still need to explicitly specify a standard.

link

dataflow 1825 days ago

Well that blows my mind, I never realized pedantic ignores the language setting. Is this the only case where it does that?

link

jcelerier 1825 days ago

... ? That's definitely not true, both anonymous structs and enums work fine in c++.

https://wandbox.org/permlink/ICaQJXCaVOt9mXdP

link

gumby 1825 days ago

No, they are forbidden by the standard (take a look at cppreference). Some compilers implement the C behavior as an extension, so tell your compiler to follow the standard strictly.

I don’t use extensions, even convenient ones, as I have to be able to run my code on a variety of compilers. If you don’t have to do that, some extensions (like this one) are really handy.

link

jcelerier 1825 days ago

From the standard, the enum-name is marked as optional in the enum grammar: in [dcl.enum](https://eel.is/c++draft/dcl.enum#11) ; it is also referenced e.g. in [dcl.dcl]:

> An unnamed enumeration that does not have a typedef name for linkage purposes ([dcl.typedef]) and that has a first enumerator is denoted, for linkage purposes ([basic.link]), by its underlying type and its first enumerator; such an enumeration is said to have an enumerator as a name for linkage purposes.

And for classes/structs, [class.pre](https://eel.is/c++draft/class.pre#def:class,unnamed) has explicit wording:

> A class-specifier whose class-head omits the class-head-name defines an unnamed class.

So both are entirely fine (and likewise, unions are too).

Note that my links are for the current draft, but I just checked and this was already the case as far back as C++11. So I wonder where this persistent myth seems to come from.

link

gumby 1824 days ago

This is awesome. I referenced cppreference, but that is not authoritative. Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!

The part of enums you quoted was C-compatible enums; anonymous scoped enums are explicitly forbidden: "The optional enum-head-name shall not be omitted in the declaration of a scoped enumeration" (dcl.enum 2).

Sigh. I will send in a clarification at least on the class/struct/union side. Ideally the grammar would be fixed rather than that paragraph.

The draft I looked at is https://timsong-cpp.github.io/cppwp/n4868/ (2020-10-18, shortly after the standard was approved).

link

midjji 1825 days ago

Use a enum in a namespace, or anonymous namespace

link

gumby 1825 days ago

This is an example of the desired use case:

    static obj& some_call (obj& o, enum struct { abandon, save } disposition) { ... };

This is a common case (and should be more common) to avoid using an obscure boolean flag, which can lead to bugs. It shouldn't need a name.

An anonymous namespace just means the name itself won't leak out; under C++ rules I need the name even to specify the enum tag, which is absurd.

link

nybble41 1824 days ago

Each instance of "enum struct { abandon, save }" would denote a different type, yes? How would you write a compatible definition to go with your prototype?

link

gumby 1823 days ago

I don’t care that they are different types; if anything that would be a feature.

The point is to prevent the “mysterious bool arguments” class of error.

The question is if ADL could infer the scope of the enum, as template instant is toon can now infer the right thing and don’t always need the <T> notation.

link

oshiar53-0 1825 days ago

IMO you can still be explicit about field offsets by writing the struct in a usual way, and using static assertions to ensure offsets match the intended layout.

link

nyanpasu64 1825 days ago

Do foo and bar deliberately overlap?

link

10000truths 1825 days ago

Yes, I was looking to demonstrate the flexibility of the approach by including overlapping fields.

link

midjji 1825 days ago

If you write to either, accessing the other, even on the overlap, is undefined behaviour

link

10000truths 1825 days ago

Type punning/aliasing with unions is well defined in gcc. Linus even has a humorous rant about it on the topic:

https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/

Sure, it's compiler-specific, but I'm already using `__attribute__((packed))` anyways.

link

midjji 1825 days ago

All undefined behaviour is well defined for each compiler, what it really means is implementation defined and subject to change without notice or documentation with every compiler version or host os what flags are enabled or a thousand other things. Why use an approach which strictly relies on specific versions of specific compilers, rather than a completely portable and standard compliant struct with char array and a few char pointers? Or if you want a convenient interface and aren't explicitly writing for the kernel, switch to a restricted subset of C++ and do it right?

link

midjji 1825 days ago

There are two kinds of undefined behaviour being invoked in using this. Its a horrible idea and a horrible code smell, get rid of it if you ever see something like this.

link

10000truths 1825 days ago

I don't see any undefined behavior here. As I mentioned below, gcc explicitly documents type punning via unions as being well defined. But yes, this is compiler specific and is not guaranteed to work elsewhere.

link

formerly_proven 1825 days ago

Accessing packed struct members works fine on x86, but will blow up at runtime or do weird things on platforms which don't support unaligned loads or stores.

The correct way to access packed structs is through memcpy, just like you'd access any other potentially unaligned object.

link

10000truths 1825 days ago

For architectures where unaligned accesses are illegal, gcc will generate multiple load/store instructions when accessing packed struct fields by name. The main caveat to look out for is taking the address of a packed struct member and then dereferencing it.

link

formerly_proven 1825 days ago

Ah, right. Thanks for the correction.

link

midjji 1825 days ago

There is absolutely undefined behaviour there. Undefined behaviour is defined not as nasal daemons but as: The compiler implementer does not guarantee that this behaviour will be hardware, circumstance, compiler version, or os consistent, nor that we will warn if we change this.

Packed is technically not a undefined behaviour, but it is certainly a trap. Especially because the compiler macros leads people to make defines which select packed by compiler automatically. Then the special case of didn't recognize compiler is just left empty, meaning compiles but no longer does what you think.

link

fifjdynb 1825 days ago

You don't get to decide what UB means. It really does mean nasal demons are a possibility: all bets are off when you run that executable. Use of the term "undefined behaviour" to mean something else may be on the increase, unfortunately (https://mars.nasa.gov/technology/helicopter/status/298/what-...), but if we're talking about C, it's meaning is fixed.

link

midjji 1824 days ago

I didnt, the deterministic nature of computer programs guarantees that what I wrote is the actual outcome.

link

flohofwoe 1826 days ago

I'm using anonymous nested structs extensively for grouping related items, but I consider the extra field name a feature, not something that should be hidden:

https://github.com/floooh/sokol-samples/blob/bfb30ea00b5948f...

(also note the 'inplace initialization' which follows the state struct definition using C99's designated initialization)

link

kevin_thibedeau 1825 days ago

The result is uglier and less maintainable than a pair of macros. Or just stop trying to hide syntax. This is ultimately on the same level as typedefing pointers.

link

remram 1826 days ago

The first example seems wrong, instead of `struct sub { ... };` what is meant is `struct { ... } sub;`

link

siebenmann 1825 days ago

You're right; thanks for noticing and I've updated the first example. My C is a bit rusty these days and I didn't check it with a compiler the way I should have.

(I'm the author of the linked-to article.)

link

sesuximo 1826 days ago

Doesn’t matter for C, but in C++ this could make your contexpr functions UB since you can only use one member of a union in constexpr contexts (the “active” member).

link

ferdek 1826 days ago

In other words: please always be wary of differences in C and C++, for instance type punning [0].

[0] https://stackoverflow.com/a/25672839

link

pjmlp 1826 days ago

Triggering UB is a compiler error in constexpr code.

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...

link

sesuximo 1826 days ago

True, you’ll hopefully get a compiler error.

link

pjmlp 1826 days ago

In C++ we have namespaces for 30 years now, no need for such tricks.

link

tialaramex 1826 days ago

Hmm. How do C++ namespaces help with the structure naming problem in this example? They seem completely orthogonal.

C++ namespaces are a way to avoid library A's symbol "cow" clashing with library B's symbol "cow" without everything being named library_a_cow and library_b_cow all over the place which is annoying. I agree C would be nicer with such a namespace feature.

However this technique is about what happens when you realise your structure members x and y should be inside a sub-structure position, and you want both:

d = calculate_distance(s.x, s.y); // Old code

and

d = calculate_distance(s.position.x, s.position.y); // New

... to work while you transition to this naming.

link

pjmlp 1826 days ago

You can use inline namespaces for versioning symbols.

https://www.foonathan.net/2018/11/inline-namespaces/

link

tialaramex 1825 days ago

First of all, C++ 11 may feel like thirty years ago, and certainly some of its proponents look thirty years older than they did at the time, but it was only ten years ago. C++ namespaces date to standardisation work (so after the 1985 C++ but before the 1995 standard C++) but they don't get this job done. Inline namespaces are a newer feature.

Secondly this technique does something different. The C hack doesn't touch the old code. But this "inline namespace" trick means old code has to explicitly opt into this backward compatibility fix or else it might blow up.

Lastly, I didn't try this, but presumably you did. Are the two separately namespaces classes the "same thing" as far as type checking is concerned? A vital feature of this union trick is that it's just one structure, it type checks as the same structure because it is the same structure. At a glance, I think the C++ solution results in two types with similar names, so that would fail type checking.

link

pjmlp 1825 days ago

Ah, another of those threads, ok lets set the years straight.

Yes, inline namespaces were only introduced in C++11, about 10 years ago, now lets dive into article.

"Learning that you can use unions in C for grouping things into namespaces"

Grouping into namespaces, so when did C++ get said feature?

ANSI/ISO C++89 released to the world in September 1998, which makes around 23 years, or 24 years if we consider the release of C++ compilers already supporting it the year before, like Borland C++.

This C hack definitly does touch old code, as it requires the code to be written to take advantage of the technique and is also touched again, when changes to the structs are required.

And naturally recompilation.

With inline namespaces, assumign recompilation you can naturally also change which set of identifiers and type aliases are visibile by default.

link

thaumasiotes 1825 days ago

Based on the writeup, this technique isn't really about enabling you to start writing `s.position.x` where the old code would have written `s.x`. If that were all you wanted, you'd just keep writing `s.x`. It's about enabling you to write `s.x` everywhere, in old code and new code, while also being able to pass `s.position` to memcpy calls. You're never supposed to write `s.position.x`.

link

comex 1826 days ago

C++ namespaces are unrelated to this. They don’t accomplish the same thing.

link

pjmlp 1826 days ago

The goal of inline namespaces is exactly to allow for migrating libraries across versions.

link

comex 1825 days ago

That's nice, but the blog post doesn't say anything about migrating libraries across versions? Looking at the comment thread, I see tialaramex's sibling comment suggested the blog post was about migration, but it's not.

I suppose migration is another possible use case for the union trick, and for that case C++ inline namespaces can be used as part of an implementation that achieves a broadly similar goal, but in a completely different way. As tialaramex notes, with inline namespaces you still end up with two different types.

link

midjji 1825 days ago

Constexpr unions is the sane/safe way to use them. Its great, because accessing a member which isnt the last one written, constexpr will explicitly prevent it compile time. Whereas all other examples here are explicitly undefined behaviour!

link

bruce343434 1825 days ago

Imo this is not “perverse”. In my vector library I alias a vec3 as float x,y,z and float[3] using this technique.

link

midjji 1825 days ago

This is also known as the most common invocation of undefined behaviour in game programming. If you do this, write to y, then read from [1]. You are invoking undefined behaviour, and compilers doing different things here between windows, linux mac, and different compiler versions is a common cause of "why isnt my game working right on XXX, it works fine on YYY questions.

link

genocidicbunny 1825 days ago

Type punning is not undefined, it's implementation defined in C. In practice, every major C compiler will be fine with type punning, though it may disable some optimizations.

The story is different in C++, but in practice many compilers support it the same as in C. Especially for games, where VC++ (PC, Xbox) and Clang (PS4/PS5) are the most commonly used compilers, it also works as expected. The trick is to only use type punning for trivial structs that don't invoke complications like con/de-structors or operators. The GP's example of a Vec3 struct that puns float x,y,z with float[3] is a very common one in games.

link

midjji 1824 days ago

Something being very common and a very common source of portability issues isn't exactly contradictory. Its a bad idea, and it is outright being taught in modern game programming courses that its a bad idea, but common in older guides, specifically because it caused so many problems. Im pissed at this specific construct because I got it handed to me in a huge game library and had to spent a long time figuring out why it wasn't working in rare, but important cases.

link

genocidicbunny 1824 days ago

But my point is that on the platforms that matter, it's not really a source of portability issues, and not a problem. For gamedev, anything outside of VC++ and Clang are niche and thus largely ignored.

link

saagarjha 1825 days ago

I don’t see the undefined behavior here?

link

shultays 1825 days ago

Op probably means cpp, where it is indeed undefined behavior. not sure about c. I doubt that if this would cause a "my game does not work on XXX" though. Is there really a compiler out there that will handle such abuse differently?

link

midjji 1824 days ago

yes its undefined behaviour in both C and C++. Yes, a number of compilers treat this differently, its also poorly supported on custom hardware using standard compilers like gcc. So compiling for some mobile device with slightly custom ... good luck.

link

bruce343434 1824 days ago

It is not undefined behavior in C. And the OP was about C.

link

bruce343434 1825 days ago

I don't mean cpp

link

PaulHoule 1825 days ago

The C programming language, brought to you by Cthulhu.

You don't need eval(), you've got strcpy()!

link

rightbyte 1826 days ago

I don't regard this as a "perverse" hack. If I ever do embedded memory mapped stuff in C11 this is way too tempting.

link

midjji 1825 days ago

You are practically guaranteed to invoke undefined behaviour if you do. Just use a map on a std::array of e.g. std::byte

link

dexterhaslem 1825 days ago

they said C11 tho

link

Subsentient 1825 days ago

Bleurgh. I have a deep soft spot for C, and I'm known to get twisted pleasure from using obscure language features in new ways to annoy people, but this is a level of abuse that even I can't get behind. If you need namespacing, use C++. As much as I love C, it's terrible for large projects.

link

vbezhenar 1825 days ago

Linux kernel is large project and clearly C is sufficient for it, given the fact that migrating to C++ would probably be very easy (not using all C++ features, but just selected ones), yet it did not happen.

I think that C++ is better than C, but C is not that bad, even for large projects.

link

dkersten 1825 days ago

> Linux kernel is large project and clearly C is sufficient for it

Sure, and operating systems have been written in assmebly too. The question is whether it would be better than just sufficient if Linux were written in C++, today (ie C++17 or 20, not something old). Switching now probably wouldn't be feasible (even ignoring technical reasons, the kernel developer community is familiar with the C codebase and code standards and bought into it), but if Linux were started today, would it be a better choice?

Maybe the answer is still no and C would still be chosen, but the choice today is very different than it was when Linux was started. Of course, maybe Rust or something would be chosen today instead.

link

Bayart 1825 days ago

Cantrill did a talk on which he touches on C, C++ and Rust for systems programming [1].

His tl;dr being that Rust feels very much like a proper systems programming language, and more of a « better C » than C++. I don't entirely know what to make of it, but my instinct is that something like C++ with such an opportunity space for baroque concoctions (leading to an obsession with design patterns) is just playing with fire.

[1] https://www.youtube.com/watch?v=LjFM8vw3pbU

link

adtac 1825 days ago

the kernel has to live with the choices it made in the 90s, you don't

link

midjji 1825 days ago

Yeah they should have upgraded to some restricted subset of C++ or new restrictive language ages ago. I mostly buy the arguments against having exceptions, perhaps even against polymorphism in general, but the argument against destructors, or atomics... hell no.

link

humanrebar 1825 days ago

> ...against polymorphism...

C has polymorphism. Inheritance-based virtual dispatch is just one kind of polymorphism. It's common to wire up polymorphism in C with bespoke data structures using tagged unions it function pointers. Changing an implementation at link time is even a form of polymorphism.

link

TeeMassive 1825 days ago

> Changing an implementation at link time is even a form of polymorphism.

I never truly appreciated how polymorphism can take so many different forms!

link

midjji 1824 days ago

yes, and that is generally worse

link

dathinab 1825 days ago

And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.

> C++ would probably be very easy

Not necessary, besides some small? problems due to the C++ allowing "more magic optimizations" then C they would switch to a sub-set of C++, and it might be so you would need to communicate to all contributors that a lot of C++ things are not allowed. And it might be easier to simple not use C++. I mean if it would be that easy the kernel likely would have switched.

link

shakna 1825 days ago

> And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.

Actually, they use it themselves. [0]

[0] https://lwn.net/SubscriberLink/864521/d704bdcced0c5c60/

link

dathinab 1825 days ago

Yes, it's scary.

But it's also not used <to have namespacing> but to <improve on cross-field memory operation>.

link

AlotOfReading 1825 days ago

A big issue with introducing C++ into a codebase is that it's incredibly hard to stick to a particular subset or standard. There's always a well-justified argument for the next standard or "just this one additional feature". Eventually you end up with the whole kitchen sink, regardless of where you started.

I've had far more success hard-firewalling C++ into its own box where programmers can use whatever they can get running than trying to limit people to subsets.

link

kktkti9 1825 days ago

People will make a mess of a large project regardless of the language.

link

midjji 1825 days ago

This is probably a terrible idea, remember that if you have written one member of a union, all other members remain public, yet accessing any of them in any way is undefined behaviour. This is made way worse by most compilers mostly choosing to let you do what you think it will. They just dont guarantee they always will or in all cases.

link

drfuchs 1825 days ago

I believe you are mistaken. The C11 standard, section 6.5.2.3 "Structure and union members" pgf 6, says "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members." And that seems to be what's being used here.

link

midjji 1825 days ago

No: from https://en.cppreference.com/w/cpp/language/union.

The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14). It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

What 6.5.2.3 simplifies is the use of unions of the type:

struct A{int type; DataA a;}

struct B{int type; DataB b;}

union U{A a;B b};

U u;

switch(u.type)...

Its not what is beeing used here.

std::variant is designed to deprecate all legitimate uses of union

link

drfuchs 1825 days ago

The post is about C, not C++. My comment stands, as the original post has two structs in a union, and they start the same way, so it’s exactly the case covered in the C11 Standard.

link

comex 1825 days ago

It's actually weirder than that. The C standard allows type punning through unions, but not because of the clause you mentioned. It allows it because of footnote 95:

> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’)

This is broader than the common initial subsequence clause, and allows punning between completely different types, e.g. int, char[4], and float.

You might ask, what is the point of the "common initial subsequence" rule then? It's to allow certain accesses that don't go directly through the union, so the compiler doesn't know for sure whether there's a union involved. Only problem is that all major compilers completely ignore this rule. [1] (But they do implement the first clause I mentioned, where the accesses do go through the union.)

[1] https://stackoverflow.com/questions/34616086/union-punning-s...

link

throwaway17_17 1825 days ago

Your response to GP is based on the C++ reference and his explicitly is based on the C standard. Your assertion that ‘ [t]he details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14)’ seems to directly conflict with the C11 standard. Also, your closing comment about std::variant is clearly only applicable to C++. I am just curious why you are using C++ when the article and GP are specifically addressing C?

link

thamer 1825 days ago

You've mentioned this several times on this page, but this is still incorrect.

The C standard references "struct or union" all over the place because the two are so similar. The distinction is of course made clear in multiple places, but one that seems relevant here is:

> As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap. (ISO/IEC 9899:201x, §6.7.2.1, #6)

That's it. There's nothing about undefined behavior if you access one member and then another later. In fact there's even a paragraph which mentions doing just that:

> The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. (ISO/IEC 9899:201x, §6.7.2.1, #16)

A pointer to the union points to each of its members, and can be dereferenced to access it.

std::variant is not used in C; C and C++ are two different languages.

link

jancsika 1825 days ago

> The size of a union is sufficient to contain the largest of its members.

Correct me if I'm wrong, but there is no part of the C spec that says this:

When initializing a union member that is smaller than the largest member, the remaining bytes will always automatically be initialized to zero.

If I'm right then the following caveat must be added to your statement:

> A pointer to the union points to each of its members, and can be dereferenced to access it.

... if and only if the member which was originally initialized is at least as large as the other member being accessed.

In other words, if you write your program in a way that ensures it will only compile when all union members are exactly the same size, and you have mandatory tooling to make sure that any changes to said union follow the same rule by force of compilation errors, then and only then can you claim what you claimed without the threat of undefined behavior.

link

adamnemecek 1826 days ago

Don't actually do this.

link

sp332 1826 days ago

The Linux kernel is using this for bounds checking. https://news.ycombinator.com/item?id=28015263

link

ufo 1825 days ago

Like the parent poster, when I read the article I assumed that there was no conceivable reason to ever use this feature in a real C program. Let me just say that I'm pleasantly surprised to be proven wrong!

link