Hacker News new | ask | show | jobs
by LegionMammal978 639 days ago
IME, there's one big thing that often keeps my programs from being unaffected by byte order: wanting to quickly splat data structures into and out of files, pipes, and sockets, without having to encode or decode each element one-by-one. The only real way to make this endian-independent is to have byte-swapping accessors for everything when it's ultimately produced or consumed, but adding all the code for that is very tedious in most languages. One can argue that handling endianness is the responsible thing to do, but it just doesn't seem worthwhile when I practially know that no one will ever run my code on a big-endian processor.
3 comments

This is functionally identical to the author's example - the file has a defined byte order and you have a choice of doing byte swapping or just explicitly writing out the bytes in the defined order. The author is saying your goal of avoiding "having to encode or decode each element one-by-one" a misguided optimization.
Byte swapping is equivalent to needing to do encoding and decoding. Is it not?
The benefit is that you'd only have to do it for the parts of the data that are actively manipulated, which might be far less than the entirety of the data structure. Also, you can easily forward a copy elsewhere in the original format.

But if you know you're not going to have endianness problems, you can just skip that step entirely.

I think the article's author would say that loading data "without having to encode or decode each element" is premature optimization and more likely to have bugs. I tend to agree.
The optimisation the parent is referring to is development time/effort; if the alternative to dumping a structure to a file is to hand roll your serialiser/deserialiser, that's a slower & probably more error prone approach (depending on the context).
The article makes an argument that the hand-rolled solution is less buggy, if you approach it the right way.

For complicated data structures, it's probably best to use a library that serializes to a common standard. (For example, protocol buffers or JSON.)

But I think the article assumes you don't get to choose the protocol, so it probably has to be hand-written by someone.

Maybe if you hand-roll the struct layout, but if you use something like flatbuffers I doubt you would see many more bugs - and flatbuffers will take care of endian swaps as necessary without you needing to think about it.
Depends what you’re doing. I have a side project that generates CSVs in the GB range. It keeps everything in bytes because encode/decode is a lot of overhead in loops when you’re hitting them millions of times.
Not once you start getting into the range of hundreds of megabytes or more, which accounts for most situations where I'd use a binary format in the first place.
By the time I’m putting hundreds of MB somewhere, I want a defined format, not whatever the compiler happens to generate for this particular build of my software. There are plenty of nice ways to do this.
Struct layouts in C are defined by the platform's ABI, and in every sane platform, that just looks like "lay out each element in order, adding the smallest-possible amount of padding to satisfy alignment requirements" [0]. There are presumably oddball platforms which do something else, but good luck actually finding one that has lots of RAM, an ordinary filesystem, and so on. (Within the realm of sane platforms, there are a few alignment oddities, but it's always safe to build packed structs as if each type is aligned to its size.)

Struct layouts for FFI in other languages tend to follow the C convention and/or allow explicit field offsets to be specified. Regardless, if you use the proper language constructs, it's nowhere near as undefined as "whatever the compiler happens to generate".

[0] https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...