Hacker News new | ask | show | jobs
by OJFord 1847 days ago
Yeah, lists of objects are pretty crap, because they're almost always homogeneous but unenforcédly so; it's not just a size issue but a parsing (or not - usage) issue too.

You could approximate CSV in a JSON response like:

    {
      "columns": ["a", ..., "z"],
      "rows": [[1, ..., 26], ..., [11, 2266]]
    }
Or:

    {
      "a": [1, ..., 11],
      ...
      "z": [26, ..., 2266]
    }
which I've never seen, but would save space, and sort of enforced in the sense that if you trust your serialiser for it as much as you trust an equivalent CSV serialiser, it's fine. (But the same argument could be made for more usual JSON object lists. Only arguable difference is that there's more of an assertion to the client that they should be expected to be homogeneous.)
4 comments

A columnar structure is definitely popular in the analytics community, although there are binary formats that are faster to scan again and can be mmap'd for zero-copy access.

The fundamental problem with CSV is that it has no canonical form or formal construction. Even the RFC documents it by example and historical reference, rather than from first principles, and does so with liberal use of "maybe". Consequently being very easy to fling about as a human but much harder to reason about in the abstract, and this is most evident when you get bogged down in the gritty details of writing a CSV importer for your application.

+1 - If you are looking for something JSON-compatible, JSON Lines (one json object per line - https://jsonlines.org/) is pretty popular as well.

You could store a CSV in JSONL very easily. In fact, jsonlines' website shows it as its first example: https://jsonlines.org/examples/

And if you wanted something that is json file-wide, you can just add some commas and wrap in [] for a list of rows.

jsonlines (and ndjson too, since they're overlapping specs) is amazing. It's definitely the better way to convert between JSON and CSV.

  » ps aux | grep root | head -n5 | format jsonl
  ["root","87596","0.0","0.0","4359648","116","??","Ss","10:01am","0:00.02","com.apple.cmio.registerassistantservice"]
  ["root","81777","0.0","0.0","4321932","88","??","Ss","Wed12am","0:00.01","PlugInLibraryService"]
  ["root","71784","0.0","0.1","4365572","10504","??","Ss","Tue11pm","0:25.88","PerfPowerServices"]
  ["root","42906","0.0","0.0","4321572","88","??","Ss","Tue09am","0:00.01","com.apple.ColorSyncXPCAgent"]
  ["root","47415","0.0","0.0","4303172","88","??","Ss","Sat04am","0:00.01","aslmanager"]

It has the readability of CSV but the stricter formatting of JSON. Win win.
i've known about json lines for a bit (had to make a utility that parsed some output of a program that used jsonlines and was irritated it didn't use the json spec)...

but for some reason just this post made me realize something... i've always been irritated by CSV log files (or space delimited), but would prefer something more structured like JSON, but JSON log files are pretty obnoxious for the reasons mentioned in the comments, a lot of data duplication...

JSON lines for log files seems like a great fit, not sure why i didn't realize this until now, i suppose it was the context of the discussion!

Oh no, this is so similar to http://ndjson.org/
ndjson appears to be a fork of jsonlines with the addition of a spec (see https://github.com/ndjson/ndjson.github.io/issues/1)
I’ll note, your second item here is a structure of arrays, which is often higher performance in practice when you are only interested in certain portions of the data.

For this reason, your second structure is how I serialize code I’m interacting with in C.

... Also known as a very simple case of a “column-oriented database”, of which are several at various scales from Metakit[1] to Clickhouse[2]. It’s a neat way to have columns which are sparsely populated, required to accommodate large blobs, numerous but usually not accessed all at once, or frequently added and deleted.

Nothing’s perfect, of course: you can’t stream records in such a format, so no convenient Unix-style tooling.

[1]: http://www.equi4.com/metakit.html [2]: https://yandex.com/dev/clickhouse/

Higher performance, much smaller compressed and uncompressed, accepted by a wide range of tools, e.g. Pandas, and often more convenient to parse from statically typed languages.
> Yeah, lists of objects are pretty crap, because they're almost always homogeneous but unenforcédly so

We use JSON Schema heavily to enforce structure in JSON data. https://json-schema.org/

i've done the first when serializing an arbitrary table of trace data in a jsonb column. Did it to make it compact and then realized that if all i wanted to do was show it in a UI it was much easier to parse as a html table, and only marginally harder to present an array of objects.