Hacker News new | ask | show | jobs
by Negitivefrags 112 days ago
Why is it such a terrible idea?

No need to add complexity, dependancies and reduced performance by using these libraries.

3 comments

Lots of reasons:

The code is not portable between architectures.

You can’t actually define your data structure. You can pretend with your compiler’s version of “pack” with regrettable results.

You probably have multiple kinds of undefined behavior.

Dealing with compatibility between versions of your software is awkward at best.

You might not even get amazing performance. mmap is not a panacea. Page faults and TLB flushing are not free.

You can’t use any sort of advanced data types — you get exactly what C gives you.

Forget about enforcing any sort of invariant at the language level.

I've written a lot of code using that method, and never had any portability issues. You use types with number of bits in them.

Hell, I've slung C structs across the network between 3 CPU architectures. And I didn't even use htons!

Maybe it's not portable to some ancient architecture, but none that I have experienced.

If there is undefined behavior, it's certainly never been a problem either.

And I've seen a lot of talk about TLB shootdown, so I tried to reproduce those problems but even with over 32 threads, mmap was still faster than fread into memory in the tests I ran.

Look, obviously there are use cases for libraries like that, but a lot of the time you just need something simple, and writing some structs to disk can go a long way.

Some people also don't use protective gear when going downhill biking, it is a matter of feeling lucky.
On the other hand some people have things to ward off evil demons, and aren't bothered by evil demons.

The parent has actually done the thing, and found no issues, I don't think you can hand wave that away with a biased metaphor.

Otherwise you get 'Goto considered harmful' and people not using it even when it fits.

As proven by many languages without native support for plain old goto, it isn't really required when proper structured programming constructs are available, even if it happens to be a goto under the hood, managed by the compiler.
My point is it's bad debating style. 'Everyone knows C is bad for all kinds of reasons ergo, even when someone presents their own actual experience, I can respond with a refrain that sounds good'

Not using goto because you've heard it's always bad is the same kind of thing. Yes it has issues, but that isn't a reason to brush anyone off that have actual valid uses for it.

C allows most of this, whereas C++ doesn't allow pointer aliasing without a compiler flag, tricks and problems.

I agree you can certainly just use bytes of the correct sizes, but often to get the coverage you need for the data structure you end up writing some form of wrapper or fixup code, which is still easier and gives you the control versus most of the protobuf like stuff that introduces a lot of complexity and tons of code.

__attribute__((may_alias, packed)) right on the struct.
Check your generated code. Most compilers assume that packed also means unaligned and will generate unaligned load and store sequences, which are large, slow, and may lose whatever atomicity properties they might have had.
That is not C, but a non-standard extension and thus not portable.
> non-standard extension and thus not portable

Modern versions of standard C aren't very portable either, unless you plan to stick to the original version of K&R C you have to pick and choose which implementations you plan to support.

That seems highly unlikely. Let's assume that all compilers use the exact same padding in C structs, that all architectures use the same alignment, and that endianness is made up, that types are the same size across 64 and 32 bit platforms, and also pretend that pointers inside a struct will work fine when sent across the network; the question remains still: Why? Is THIS your bottleneck? Will a couple memcpy() operations that are likely no-op if your structs happen to line up kill your perf?
I guess to not have to set up protobuf or asn1. Those preconditions of both platforms using the same padding and endianness aren't that hard to satisfy if you own it all.

But do you really have such a complex struct where everything inside is fixed-size? I wouldn't be surprised if it happens, but this isn't so general-purpose like the article suggests.

There are at least 10 steps between protobuf and casting a struct to a char*.
"Portable" has originally meant "able to be ported" and not "is already ported"
No defined binary encoding, no guarantee about concurrent modifications, performance trade-offs (mmap is NOT always faster than sequential reads!) and more.
Doesn't that just describe low level file IO in general?
Because a struct might not serialize the same way from a CPU architecture to another.

The sizes of ints, the byte order and the padding can be different for instance.

C has had fixed size int types since C99. And you've always been able to define struct layouts with perfect precision (struct padding is well defined and deterministic, and you can always use __attribute__(packed) and bit fields for manual padding).

Endianness might kill your portability in theory. but in practice, nobody uses big endian anymore. Unless you're shipping software for an IBM mainframe, little endian is portable.

You just define the structures in terms of some e.g. uint32_le etc types for which you provide conversion functions to native endianness. On a little endian platform the conversion is a no-op.
It can be made to work (as you point out), and the core idea is great, but the implementation is terrible. You have to stop and think about struct layout rules rather than declaring your intent and having the compiler check for errors. As usual C is a giant pile of exquisitely crafted footguns.

A "sane" version of the feature would provide for marking a struct as intended for ser/des at which point you'd be required to spell out every last alignment, endianness, and bit width detail. (You'd still have to remember to mark any structs used in conjunction with mmap but C wouldn't be any fun if it was safe.)