|
|
|
|
|
by ellimilial
1521 days ago
|
|
If it fits on a single machine - jq, flat files, JSON lines / avro if relatively flat. Change to a tabular format if when nesting not required. Postgres JSONB works, but it requires maintaining a heavy server process. So does Lucene/elasticsearch. I have been yearning for embeddable store (in line with SQLite the support that both works and also keeps the data compressed like JSONB). I know there were some attempts, tried some of it those, mostly monstrosities). |
|
For those who don't know, JSONB is a binary JSON encoding that is specifically optimized for data at rest and compression thereof.
The key feature in JSONB is that most internal pointers [from arrays and objects] to values are in the form of lengths, with every 32nd pointer being an offset. This comes from the observation that offsets will not repeat, therefore are difficult to compress w/ off the shelf compression algorithms, but length values will often be the same and thus be compressible. This means that iterating an array (say) requires 31 additions for every 32 elements to recover the offsets to those 31 elements' values.
The story of how they came to this optimization for compression is fascinating. IIRC they implemented an offsets-only JSONB and were very happy with it until they discovered that that form of JSONB did not compress anywhere near as well as expected, and since PG was close to shipping, a feverish hunt for the cause ensued that culminated in the fix of mostly-using-lengths-instead-of-offsets.