|
I can’t entirely tell what the article’s point is. It seems to be trying to say that many languages can mmap bytes, but: > (as far as I'm aware) C is the only language that lets you specify a binary format and just use it. I assume they mean: struct foo { fields; };
foo *data = mmap(…);
And yes, C is one of relatively few languages that let you do this without complaint, because it’s a terrible idea. And C doesn’t even let you specify a binary format — it lets you write a struct that will correspond to a binary format in accordance with the C ABI on your particular system.If you want to access a file containing a bunch of records using mmap, and you want a well defined format and good performance, then use something actually intended for the purpose. Cap’n Proto and FlatBuffers are fast but often produce rather large output; protobuf and its ilk are more space efficient and very widely supported; Parquet and Feather can have excellent performance and space efficiency if you use them for their intended purposes. And everything needs to deal with the fact that, if you carelessly access mmapped data that is modified while you read it in any C-like language, you get UB. |
We're so deep in this hole that people are fixing this on a CPU with silicon.
The Graviton team made a little-endian version of ARM just to allow lazy code like this to migrate away from Intel chips without having to rewrite struct unpacking (& also IBM with the ppc64le).
Early in my career, I spent a lot of my time reading Java bytecode into little endian to match all the bytecode interpreter enums I had & completely hating how 0xCAFEBABE would literally say BE BA FE CA (jokingly referred as "be bull shit") in a (gdb) x views.