Hacker News new | ask | show | jobs
by mikeash 4573 days ago
The fact that you can't know all of your future targets will support that is a problem.

Endian issues are also a big problem here. The need to byte-swap everything eliminates a lot of the convenience.

It's also way too easy to make breaking changes to the struct, since there's no standard way to mark a struct as being something that you serialize/deserialize, and normally you can change struct fields at will in C code.

Just take the extra five minutes to write code to translate between your struct and a stream of bytes. It'll be easier to understand, less risky, and more compatible.

3 comments

"The fact that you can't know all of your future targets will support that is a problem."

I don't see this as much of a problem as endianness, but yeah, maybe, x86 made us accustomed. And you can use portable types, defined on headers (linux does that, like u8, u16, etc)

But sometimes you know your target won't change (for a long time)

"It's also way too easy to make breaking changes to the struct, since there's no standard way to mark a struct as being something that you serialize/deserialize, and normally you can change struct fields at will in C code."

In the same way you can break your software doing any change. You can mark it with a naming convention but the easiest way is seeing the "pack" directives on it. And unit tests

"Just take the extra five minutes to write code to translate between your struct and a stream of bytes. It'll be easier to understand, less risky, and more compatible. "

Well, it's not five minutes. And it makes your program slower (for the most convoluted data types). A simple example:

http://en.wikipedia.org/wiki/BMP_file_format

And yes, GIMP reads field by field https://git.gnome.org/browse/gimp/tree/plug-ins/file-bmp/bmp...

Does it really make your program slower? That padding you have to eliminate for this technique is added for speed, after all. The initial read may be faster, but every access to the data is going to be slower.

I'm not quite sure what the BMP file format is supposed to be an example of, especially since file formats are separate from techniques used to read or write them.

Some years ago I had a discussion with a then-colleague who pointed out something interesting that stuck with me: the world has mostly settled on little endian. If you stay outside of certain niches you are unlikely to ever see a big endian CPU.

Time was you might have to run on a SPARC or PowerPC or whatever. But x86 and little-endian ARM have pretty much won for most people.

I don't think my friend was right that this means you can totally ignore the issues, but I have to admit he was right that from a strictly practical perspective it's not the huge deal it was in the 90s. I agree with you that you don't know how the future will break you, but my guess is the momentum of binary compatibility will keep it so for a while.

That's a good point, and he's largely right. On the other hand, I've been burned before (Macs were always going to be 68k until they went PPC, they were always going to be PPC until they went Intel, they were always going to be Intel until Apple made these tiny mini-Macs that were ARM) and I'm wary of making assumptions anymore.
This is usually my kind of instinct too, which is why I was attributing the other viewpoint to someone else. But it was interesting to understand where someone like that is coming from. You can spend a lot of time adding the right swaps etc. but in the end if you don't own hardware that works that way, aren't testing it regularly, from a certain perspective you may be wasting your time.

OTOH I recall that insightful article from Rob Pike about how the "right" way to do it in a testable fashion is to not think in terms of swapping at all, and just do shifts that are portable regardless of architecture. http://commandcenter.blogspot.com/2012/04/byte-order-fallacy...

(By the way, in your 68k -> PPC -> Intel -> ARM example, the endianness only changes once. This was actually part of my friend's original argument, that endianness changes are even more costly than the instruction set, and platform vendors would be unwise to change it. "Modern" CPUs support both endianness types, so those shipping ARM platforms are in effect consciously deciding to be the same as Intel.)

For what it's worth, I completely agree with Pike there. It makes a lot more sense to me, and refrains from completely falling afoul of the C standard.
> The fact that you can't know all of your future targets will support that is a problem.

To be honest I've even seen compilation issues coming up between ubuntu 10.x and 11.x. I don't worry about future targets that much. Until they're tested, I assume they will not work.