Hacker News new | ask | show | jobs
by jcalvinowens 76 days ago
Don't ignore endianness. But making little endian the default is the right thing to do, it is so much more ubiquitous in the modern world.

The vast majority of modern network protocols use little endian byte ordering. Most Linux filesystems use little endian for their on-disk binary representations.

There is absolutely no good reason for networking protocols to be defined to use big endian. It's an antiquated arbitrary idea: just do what makes sense.

Use these functions to avoid ifdef noise: https://man7.org/linux/man-pages/man3/endian.3.html

2 comments

What do you mean by “networking protocols,” exactly? Most packet level Internet protocols (TCP, UDP, etc.) are big endian. Ethernet is big endian at the octet level and little endian on the wire at the bit level. Network order is big endian because it has to be something and it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom. There is no right answer to endianness. It’s like which side of the road cars should drive on. You just need to pick one and stick with it. Mostly people bitch about endianness when their processor is the opposite of whatever someone else picked. But processors are all over the map. IBM mainframes are big endian. Motorola 68k is big. HP PA-RISC is big. IBM Power started big and then went bi. MIPS is bi. RISC-V is little. ARM is bi but dominantly little (AArch64). And of course x86 is little. So, take your pick. That said, little endianness is the right answer as is driving on the right side of the road.
> RISC-V is little

These days it's bi, actually :) Although I don't see any CPU designer actually implementing that feature, except maybe MIPS (who have stopped working on their own ISA, and now want all their locked-in customers to switch to RISC-V without worrying about endianness bugs)

Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big. But IIRC the standard profiles specify little, so it's pretty much always going to be little. But yea, technically speaking data load/store could be big. Maybe that's important for some embedded environments.
> Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big

ARM works the same way. And SPARC is the opposite, instructions are always big-endian, but data can be switched to little-endian.

I read your reply as mostly agreeing with me: endianness is arbitrary, using big endian for a novel protocol just because some widely used protocols decided to decades ago is silly.

> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom.

There are many reasons for big endian... but that is not one of them :)

> But processors are all over the map

That's not true anymore, big endian is dead. Upstream Linux is refusing to support big endian riscv at all, and is making serious noises about ripping out the existing big endian aarch64 support because the companies that ship the hardware that needs it don't work upstream.

> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom

This argument is pretty silly: visualizations can always be changed. For some time I have been thinking that hexdumps on little-endian systems ought to be written right-to-left: in fact, when I once decided to include such a right-to-left dumper in my own software, it took me very little time for me to get used to, and I immediately started regretting I don't have it available everywhere.

You should actually not use format-swapping operations.

You should actually use format-swapping loads/stores (i.e deserialization/serialization).

This is because your computer can not compute on values of non-native endianness. As such, the value is logically converted back and forth on every operation. Of course, a competent optimizer can elide these conversions, but such actions fundamentally lack machine sympathy.

The better model is viewing the endianness as a serialization format and converting at the boundaries of your compute engine. This ensures you only need to care about endianness when serializing and deserializing wire formats and that you have no accidental mixing of formats in your internals; everything has been parsed to native before any computation occurs.

Essentially, non-native endianness should only exist in memory and preferably only memory filled in by the outside world before being parsed.

Somebody has to actually write the code at some point, it can't be serialization abstractions all the way down. That's what I'm talking about.