Hacker News new | ask | show | jobs
by ScottBurson 3356 days ago
For one thing, all numbers can be written in binary, saving the lexing and conversion time. For another, strings can be written by first writing the length (in binary, of course), then writing the raw contents; there's no need to scan the input looking for the closing quote, handle backslash escapes, or do UTF-8 conversion.

That's probably most of the gain right there, but more things can be done along those lines.

1 comments

And no whitespace or curly braces taking up room, so the serialized data is smaller, and thus faster to transmit/store. Downside: Legibility? Future-proofing? Whats that?
>Downside: Legibility? Future-proofing? Whats that?

There's nothing in this practice that is against future-proofing.

Legibility, yes, but those formats are not meant to be human readable.

Without the ability to future-proof being inherent in the format (like XML, which is self-describing), the sad reality of development practices in programming shops mean that one day, someone will make an undocumented change or take a shortcut that will tightly couple the binary format to the specific version of the code used to produce & read it. Which is fine, as long as you know that that coupling will happen when you're planing things. Not so fun when you have to go back and read a 3-year old file, only to discover that you can't.

Something that comes to mind is the old COM formats that MS-Office used to use. Eventually they had to abandon it (and not just because of the EU lawsuit) because it was unmaintainable, and no one understood how they worked well enough to not-break backwards compatibility for the next release.