| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by samhw 1704 days ago

> If a compact columnar representation is what you're after to avoid having to repeat every field name in an array of objects (which CSV is good for)

Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it. The only case in which that wouldn't be true is when dealing with extremely unusual and heteromorphic data, but then this format doesn't seem to support such data at all.

I'm also mystified that the author claims this is readable. It looks eminently unreadable compared with JSON, if you have anything beyond one row of very simple data with all optional fields present. And, in that case, it's basically just 'JSON with the keys on a different row'.

(Congrats to the author, but this is more of a fun personal project rather than something to seriously present as a 'JSON killer'. If you do present it as a JSON killer, then you have to expect a rigorous review.)

1 comments

fstrthnscnd 1703 days ago

> Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it.

Gzipping indeed helps in getting mostly back the space taken by the field names, but a parser will still have to parse these strings. On a large document, this might have a performance impact.

One good side of having the field names however is that one can reorder them adlib.

link

samhw 1703 days ago

That's true, but the main argument made by the website is about the space advantage, so it's very relevant that that space advantage is basically nullified by the widespread use of compression.

If your worry is parsing speed, then JSON not only has battle-tested parsers, but also has SIMD-assisted parsers which can process gigabytes a second on a single core (e.g. https://github.com/simdjson/simdjson). It would take Internet Object years to develop parsers as performant as that, even if it did, by some miracle, achieve wide uptake. So the notional advantage afforded by not having keys on each row is neither here nor there.

And incidentally, as someone who's written a handful of parsers, I suspect that this scheme would not be particularly easy to parse. You need lookahead because of optional fields, as well as maintaining state and a lookup table for mapping positions to keys, etc. I can draw up a quick parser in pseudocode or Python to explain, if you disagree.

link

fstrthnscnd 1702 days ago

> If your worry is parsing speed

I am not personally worried by perf in either case, but I see your point.

> It would take Internet Object years to develop parsers as performant as that

Well, implementing a JSON parser is arguably difficult, for many reasons, I suppose the main one is the flexibility it provides. I don't know if this would be the case for this format however. TBH, I doesn't seem to add too much to CSV, and perhaps it would be simpler to use CSV with the first line of this format has a hint for the data structure.

link