Hacker News new | ask | show | jobs
by ludocode 1847 days ago
This is the real answer. All of the other answers are suggesting various changes to the JSON structure to eliminate key repetition, but this is irrelevant under compression.

Where it becomes relevant is if each record is stored as a separate document so you can't just compress them all together. Compressing each record separately won't eliminate the duplication, so you're better off with either a columnar format (like a typical database) or a schema-based format (like protobuf.)

1 comments

To parse it, you need to check the keys. If there can’t be other keys, you can just use an array which is stable on JSON and you can save on keys.

So you just have an array of arrays. Or even a huge array and every X elements, it’s a new record.

If each one has 2 keys,

    [
        {
            key1: ‘a’,
            key2: ‘b’
        },
        {
            key1: ‘a’,
            key2: ‘b’
        }
    ]
Can become,

    [
        [
            ‘a’,
            ‘b’
        ],
        [
            ‘a’,
            ‘b’
        ]
     ]
Or just every 2 will be a new record,

    [
        ‘a’,
        ‘b’,
        ‘a’,
        ‘b’
    ]
But why? Why save on keys when compression will nearly eliminate them for you?
Compression mainly helps with transmission.

Trying to point out that the original structure allows for more flexibility.

If you only cared about space, this compresses better anyway and uncompressed, it still occupies less space.