Hacker News new | ask | show | jobs
by fstrthnscnd 1694 days ago
> Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it.

Gzipping indeed helps in getting mostly back the space taken by the field names, but a parser will still have to parse these strings. On a large document, this might have a performance impact.

One good side of having the field names however is that one can reorder them adlib.

1 comments

That's true, but the main argument made by the website is about the space advantage, so it's very relevant that that space advantage is basically nullified by the widespread use of compression.

If your worry is parsing speed, then JSON not only has battle-tested parsers, but also has SIMD-assisted parsers which can process gigabytes a second on a single core (e.g. https://github.com/simdjson/simdjson). It would take Internet Object years to develop parsers as performant as that, even if it did, by some miracle, achieve wide uptake. So the notional advantage afforded by not having keys on each row is neither here nor there.

And incidentally, as someone who's written a handful of parsers, I suspect that this scheme would not be particularly easy to parse. You need lookahead because of optional fields, as well as maintaining state and a lookup table for mapping positions to keys, etc. I can draw up a quick parser in pseudocode or Python to explain, if you disagree.

> If your worry is parsing speed

I am not personally worried by perf in either case, but I see your point.

> It would take Internet Object years to develop parsers as performant as that

Well, implementing a JSON parser is arguably difficult, for many reasons, I suppose the main one is the flexibility it provides. I don't know if this would be the case for this format however. TBH, I doesn't seem to add too much to CSV, and perhaps it would be simpler to use CSV with the first line of this format has a hint for the data structure.