Hacker News new | ask | show | jobs
by flqn 1704 days ago
I'm sceptical about the value proposition of this without seeing much more than a simple example that offers little over existing hypermedia+json/csv practices.

If a compact columnar representation is what you're after to avoid having to repeat every field name in an array of objects (which CSV is good for) but you don't want to give up the ability to include metadata in your JSON, there are a ton of different ways for structure your document to solve this issue without inventing new document formats.

Also this example is unclear (possibly ambiguous?); how is "int" as a type for the "age" column distinguished from "street", "city", etc as what I assume are field names?

3 comments

> If a compact columnar representation is what you're after to avoid having to repeat every field name in an array of objects (which CSV is good for)

Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it. The only case in which that wouldn't be true is when dealing with extremely unusual and heteromorphic data, but then this format doesn't seem to support such data at all.

I'm also mystified that the author claims this is readable. It looks eminently unreadable compared with JSON, if you have anything beyond one row of very simple data with all optional fields present. And, in that case, it's basically just 'JSON with the keys on a different row'.

(Congrats to the author, but this is more of a fun personal project rather than something to seriously present as a 'JSON killer'. If you do present it as a JSON killer, then you have to expect a rigorous review.)

> Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it.

Gzipping indeed helps in getting mostly back the space taken by the field names, but a parser will still have to parse these strings. On a large document, this might have a performance impact.

One good side of having the field names however is that one can reorder them adlib.

That's true, but the main argument made by the website is about the space advantage, so it's very relevant that that space advantage is basically nullified by the widespread use of compression.

If your worry is parsing speed, then JSON not only has battle-tested parsers, but also has SIMD-assisted parsers which can process gigabytes a second on a single core (e.g. https://github.com/simdjson/simdjson). It would take Internet Object years to develop parsers as performant as that, even if it did, by some miracle, achieve wide uptake. So the notional advantage afforded by not having keys on each row is neither here nor there.

And incidentally, as someone who's written a handful of parsers, I suspect that this scheme would not be particularly easy to parse. You need lookahead because of optional fields, as well as maintaining state and a lookup table for mapping positions to keys, etc. I can draw up a quick parser in pseudocode or Python to explain, if you disagree.

> If your worry is parsing speed

I am not personally worried by perf in either case, but I see your point.

> It would take Internet Object years to develop parsers as performant as that

Well, implementing a JSON parser is arguably difficult, for many reasons, I suppose the main one is the flexibility it provides. I don't know if this would be the case for this format however. TBH, I doesn't seem to add too much to CSV, and perhaps it would be simpler to use CSV with the first line of this format has a hint for the data structure.

Looked for a spec, but couldn’t find it, so here’s a _guess_: there’s significant whitespace between the colon and the opening brace:

  age:{int, min:20},
  address: {street, city, state}
Alternatively, there may be a set of forbidden field names, including bool, int and string.

Of these two, I like neither, but would opt for the latter.

I also considered that min:20 implied the previous had to be a type, but I don’t see how that’s consistent with

  active?:bool
and

  tags?:[string]
I agree. CSV + Metadata/field types (which JSON can handle) plus zipping (dictionary coding) takes care of, what, 99.9999% of the issues folks have with one type or the other?