Hacker News new | ask | show | jobs
by cryptonector 1521 days ago
JSONB is incredibly awesome, and should be extracted from PG and made usable on its own.

For those who don't know, JSONB is a binary JSON encoding that is specifically optimized for data at rest and compression thereof.

The key feature in JSONB is that most internal pointers [from arrays and objects] to values are in the form of lengths, with every 32nd pointer being an offset. This comes from the observation that offsets will not repeat, therefore are difficult to compress w/ off the shelf compression algorithms, but length values will often be the same and thus be compressible. This means that iterating an array (say) requires 31 additions for every 32 elements to recover the offsets to those 31 elements' values.

The story of how they came to this optimization for compression is fascinating. IIRC they implemented an offsets-only JSONB and were very happy with it until they discovered that that form of JSONB did not compress anywhere near as well as expected, and since PG was close to shipping, a feverish hunt for the cause ensued that culminated in the fix of mostly-using-lengths-instead-of-offsets.

1 comments

I really wish it preserved key order ... is quite annoying losing this at the storage layer ...
So preserving key order is... nice for some things, but what's nice about JSONB is that it's optimized for reading and querying.
i am curious to known an example where key order would matter.
Had this exact issue. The UBL [1] standard has a primarty XML representation where the order of elements are enforced in the schema. It also has a JSON representation, so when going from JSON to XML the exact order is needed to obtain a valid XML.

[1] https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=...