Hacker News new | ask | show | jobs
by MS_Buys_Upvotes 3357 days ago
Can someone explain to an amateur why serialization is faster than say passing raw JSON?

It seems like parsing JSON would be faster than the serialize -> deserialize process but with the popularity of things like Protobuff it's clear that JSON is slower.

6 comments

Serialization is the process of writing arbitrary data out into a blob of some sort (binary, text, whatever) that can be read in later and processed back into the original data, possibly not by the same system. This should be considered to include even the degenerate case of just writing the content of an expanse of RAM out, as that still raises issues related to serialization.

"JSON Serialization" and "Java Serialization" are two different things that can accomplish that goal. It sounds to me from your question that you think they have some fundamental difference, because your second paragraph implies you believe there is some sort of fundamental difference between Java serialization and JSON serialization, but there isn't. There is a whole host of non-fundamental differences that you always have to consider with a serialization format (speed, what can be represented, circular data structure handling, whether untrusted data can be used), but there's not a fundamental difference.

JSON is a serialization format, just one that at least nods in the direction of human-readability. Formats which don't worry about human readability (e.g. Protobuf) can gain various degrees of efficiency.
For one thing, all numbers can be written in binary, saving the lexing and conversion time. For another, strings can be written by first writing the length (in binary, of course), then writing the raw contents; there's no need to scan the input looking for the closing quote, handle backslash escapes, or do UTF-8 conversion.

That's probably most of the gain right there, but more things can be done along those lines.

And no whitespace or curly braces taking up room, so the serialized data is smaller, and thus faster to transmit/store. Downside: Legibility? Future-proofing? Whats that?
>Downside: Legibility? Future-proofing? Whats that?

There's nothing in this practice that is against future-proofing.

Legibility, yes, but those formats are not meant to be human readable.

Without the ability to future-proof being inherent in the format (like XML, which is self-describing), the sad reality of development practices in programming shops mean that one day, someone will make an undocumented change or take a shortcut that will tightly couple the binary format to the specific version of the code used to produce & read it. Which is fine, as long as you know that that coupling will happen when you're planing things. Not so fun when you have to go back and read a 3-year old file, only to discover that you can't.

Something that comes to mind is the old COM formats that MS-Office used to use. Eventually they had to abandon it (and not just because of the EU lawsuit) because it was unmaintainable, and no one understood how they worked well enough to not-break backwards compatibility for the next release.

Either way, it's serialization: object serialization or JSON serialization.

However, independently of how the data is represented (JSON or one of the many binary formats), the issue is to only encode/decode what you actually care about. From the little I understand about java object serialization, there's a lot of extra stuff that gets encoded, which may not be needed at all for the application at hand.

For an example of efficient serialization techniques, take a look at some of the MPEG formats (the older ones are easier to grok). They have a neat way of representing what is needed and dealing with optional data.

JSON must also be serialized or deserialized. Parsing it is slow and hard and not cache friendly.

Protobuf has the benefit of being extremely compact and, in some important languages, fast and friendly to serialize and deserialize.

Thanks! You and the others are right: I didn't know JSON was serialized.

I can see why something in binary would be faster than structured text (think assembler vs Python).

Thanks again.

>I didn't know JSON was serialized.

Think of it like this: anytime you get stuff from the memory of your program (arrays, lists, strings, etc) and export it in a textual or binary format that can be exchanged between programs, moved over the network, saved to a file, etc, that's serialization.

JSON is a serialization format itself. Even Javascript, which has JSON-like structures as native objects, has to serialize it (JSON.stringify) and deserialize it (JSON.parse) into text.