Hacker News new | ask | show | jobs
by GMoromisato 640 days ago
I think the article's author would say that loading data "without having to encode or decode each element" is premature optimization and more likely to have bugs. I tend to agree.
4 comments

The optimisation the parent is referring to is development time/effort; if the alternative to dumping a structure to a file is to hand roll your serialiser/deserialiser, that's a slower & probably more error prone approach (depending on the context).
The article makes an argument that the hand-rolled solution is less buggy, if you approach it the right way.

For complicated data structures, it's probably best to use a library that serializes to a common standard. (For example, protocol buffers or JSON.)

But I think the article assumes you don't get to choose the protocol, so it probably has to be hand-written by someone.

Maybe if you hand-roll the struct layout, but if you use something like flatbuffers I doubt you would see many more bugs - and flatbuffers will take care of endian swaps as necessary without you needing to think about it.
Depends what you’re doing. I have a side project that generates CSVs in the GB range. It keeps everything in bytes because encode/decode is a lot of overhead in loops when you’re hitting them millions of times.
Not once you start getting into the range of hundreds of megabytes or more, which accounts for most situations where I'd use a binary format in the first place.
By the time I’m putting hundreds of MB somewhere, I want a defined format, not whatever the compiler happens to generate for this particular build of my software. There are plenty of nice ways to do this.
Struct layouts in C are defined by the platform's ABI, and in every sane platform, that just looks like "lay out each element in order, adding the smallest-possible amount of padding to satisfy alignment requirements" [0]. There are presumably oddball platforms which do something else, but good luck actually finding one that has lots of RAM, an ordinary filesystem, and so on. (Within the realm of sane platforms, there are a few alignment oddities, but it's always safe to build packed structs as if each type is aligned to its size.)

Struct layouts for FFI in other languages tend to follow the C convention and/or allow explicit field offsets to be specified. Regardless, if you use the proper language constructs, it's nowhere near as undefined as "whatever the compiler happens to generate".

[0] https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...