Hacker News new | ask | show | jobs
by _rend 327 days ago
For completeness, this description of alignment is misleading:

> Well, dear reader, this padding is added because the CPU needs memory to be aligned in sets of 4 bytes because it’s optimized in that fashion.

> ...

> Remember: since structs are aligned to 4 bytes, any padding is therefore unnecessary if the size of the struct is a multiple of 4 without the padding.

Individual data types have their own alignment (e.g., `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long` may be 8, etc.), and the alignment of a compound type (like a struct) defaults to the maximum alignment of its constituent types.

In this article, `struct Monster` has an alignment of 4 because `int` and `float` have an alignment of 4 for the author's configuration. Expanding one of the `int`s to a `long` could increase the alignment to 8 on some CPUs, and removing the `int` and `float` fields would decrease the alignment to 1 for most CPUs.

5 comments

Also keep in mind that is also all very CPU and compiler specific. Had one compiler where it packed everything at 4/8, usually 8. Not the 1/2/4/8 you would expect. That was because the CPU would just seg fault if you didnt play nice with the data access. The compiler hid a lot of it if you set the packing with offsets and mem moves and shifting. It was clever but slow. So they by default picked a wide enough packing that removed the extra instructions at the cost of using more memory. x86 was by far the most forgiving while at the time I was doing it. ARM was the least forgiving (at least on the platform I was using). With MIPS being OK in some cases but not others.
Some of the Cray hardware was basically pure 64-bit. The systems largely didn’t recognize smaller granularity. I learned a lot of lessons about writing portable C by writing code for Cray systems.
On one of these less forgiving architectures, how does one write programs that read some bytes off the network, bitcast them into a struct, and do something based on that?

On x86 you would use a packed struct that matches the wire protocol.

Wouldn’t this require extra copying if member reads were forced to be aligned?

yep exactly that. I had that exact issue. Junk coming in from a tcp/ppp connection then had to unpack it. Tons of garbage moves and byte offsetting and then making sure you keep the endianness correct too. On the platform I was using luckily memcpy could do most of what I needed. Not the best way to do it but the wildly out of date branch of gcc could do it. Got pretty good at picking junk out of random streams shifting and and/or whatever was needed. Totally useless skill for what I work on these days.
Um ... isn't alignment generally dictated by the platform ABI so that programs compiled by different compilers can be linked together?
The widely used platforms with multiple compilers generally have one or more written down ABIs that the compilers all follow, but more niche platforms frequently have exactly one compiler (often a very out of date fork of gcc) that just does whatever they felt like implementing and may not even support linking together things built by different versions of that one compiler.
We had that exact thing. Our target at the time was about 6 different platforms. 2 of them had very picky compilers/ABI. We were trying to keep it to one codebase with minimal if-def callouts. Learned very quickly not all compilers are the same even thought they may have the same name and version number. Then the std libs are subtly different enough from each other you really have to pay attention to what you are doing.
Ideally yes, but practically there are at least a dozen just for x86.(there's like 3 big ones).
AFAIK alignment doesn't even matter anymore (for CPU data at least) since the 'word size' of a modern CPU is the size of a cache line (32 or 64 bytes?), e.g. unaligned accesses within a 32 or 64 byte block are not different than aligned accesses.

(technically there is still an advantage of items aligned to their size in that such an item can never straddle adjacent cache lines though)

And there's also still tons of different alignment requirements when working with GPU data - and interestingly those alignment requirements may differ from C's alignment rules, so you may need to explicitly use packed structs (which are still not a standard C feature!) with manual padding.

My understanding is that C++ compilers still add padding by default for performance reasons. CPU will have to spend a few cycles to reorganize data that is not aligned in chunks of 4 bytes.
Daniel Lemire did some measuring ~~recently~~ (oops, in 2012):

https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

TL;DR: 10% difference on what in 2012 was a low-end CPU, no difference on "new in 2012" CPUs. So my guess is that by now it really doesn't matter anymore :)

Wasn't aware of that, thanks for the link!
> CPU will have to spend a few cycles to reorganize data that is not aligned in chunks of 4 bytes.

That's not true for quite a lot of CPUs. Pretty much all x64 and stuff don't care

I.e. the author is wrong; struct s { char x; }; is not required to be four-byte-aligned. It can have a 1 byte size, and thus alignment.
And bool may be 4, and char may be 2(m68k)!
Not unless you're stuck in the 90s ;) These sizes have all been standardized to 1 byte since C99.
Not the sizes, the alignments.
Same thing in C though, the primitive types have the same alignment as their size.
That is true on some systems, but not a portable assumption by any means.

Edit:

st7: sizeof(uint32_t)=4, aligned(uint32_t)=1

msp430: sizeof(int)=2, alignof(int)= 1

Real old ARM: sizeof(double)=8, alignof(double)=4

OG M68k sizeof(char)=1, but struct align(char)=2

Also, "sizeof(bool) is not required to be 1."

I've seen register-sized bool on systems without free conversion between register sizes.

> Individual data types have their own alignment (e.g., `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long` may be 8, etc.), and the alignment of a compound type (like a struct) defaults to the maximum alignment of its constituent types.

I will add that this is implementation defined. IIRC the only restriction the standard imposes on the alignment of a struct is that a pointer to it is also a pointer to its first member when converted, meaning its alignment must practically be a multiple of that of its first field.

implementation-defined means your specialized platform can be supported without needing to conform - it does not mean that common knowledge is false for common users
"Implementation-defined" means that there is nothing to conform to as far as the standard is concerned. I have not claimed that "common knowledge is false for common users" or anything to that effect. My comment is additive, which should have been clear to anyone reading the first three words of it.