Hacker News new | ask | show | jobs
by niccaluim 3633 days ago
I view them unfavorably.
3 comments

Ha! It is funny because it is true. I remember listning to one of Joe Armstrong's talks (React 2014 conf and he talks at some point ( https://youtu.be/rQIE22e0cW8?t=2009 ) how parsing is rather expensive CPU-wise and also bandwidth expensive. Especially in mobile networks. The company he works for control data paths for smart phones to the internet, and they sweat wasting every little bit because it eats into the precious bandwitdth available to consumers -- and what do developers do? -- they shove JSON through that channel in the application level!

It was a silly observation but it is also true at some level. JSON might be easy to read, but reading a 100M json still needs a special editor.

Another funny observation Joe made at some point when a response to him calling JSON out, because "after all, you can see JSON" was that, your eyes cannot see JSON, they see photons bouncing from the screen. You still use an editor or some other translation program to display it and read it. So at some point might as well use binary (thrift, protobufs, sqlite, ...).

I personally think that's one of the more sane replies.

It is really important to chop up your JSON files into smaller sub-files. This will not only make it easier to backup and read manually but will usually give you a speed boost (can read to and write to more then 1 part of the "db" at a time).

This is probably sensible. JSON is a text-based format that requires you to parse the entire thing just to get an outline, unlike well-designed binary formats.
Well-designed text formats too?
Not necessarily. (I'd actually argue "not at all.") Presumably you have a text format in the first place because you want your representation to be human-readable [with common tools like a text editor], and very likely human-writable as well. Those are the real constraints of a (useful) text format, and they tend to be in direct conflict with high-performance parsing, or partial parsing.

For an arbitrary example, a binary format could have an index table of objects at the start of the file, and then you could perform partial reads to access only the subset of objects you care about. That's something you could do in a text format too, but if the file is edited in a text editor you can't guarantee that the user remembered or bothered to update the index when they added a new object. The parser would effectively not be able to trust the index, and have to parse the entire file. (I suppose you could use CRCs or something to enforce this, but then you'd end up with a very brittle format that people get frustrated when trying to edit.)

Really, the true advantage of a binary format is you generally assume that nobody messed with the data behind your back, so you can have duplicate data (like an index) if you want without worrying that it's out of sync. This pretty much goes hand in hand with the fact that you can't just open it in a text editor and fiddle with stuff.

TLDR: Human-writability and high-performance are arguably mutually exclusive features.

> Really, the true advantage of a binary format is you generally assume that nobody messed with the data behind your back

I would rephrase that a bit and say the true advantage is flexibility, as you're not subject to the constraints of textual data.

The integrity of the data is a separate matter, and should be carefully verified rather than trusted implicitly. A huge amount of security vulnerabilities, and program crashes in general, come from errantly assuming that user-supplied data is correct.

Indeed. Another example is that you could have a length field in a text format that precedes a string, so you can skip over it without parsing it. But humans will forget to update it, or update it incorrectly.