Hacker News new | ask | show | jobs
by faleidel 1431 days ago
I would like to see gzip compression added to the benchmark
1 comments

    curl https://raw.githubusercontent.com/mortie/jcof/main/tests/corpus/meteorites.json | gzip -9 | wc

    Gives me 34569
So the comparison is:

    JSON: 244920 bytes
    JCOF: 87028 bytes
    GZIP: 34569 bytes
To be fair you would gzip the JCOF encoding in this example too.

Author mentions gzip doesn’t work for some use cases. For use case mentioned I’d expect sqlite to be similar, at least that is the default thing I’d reach for.

If for some reason sqlite wasn’t sufficient probably a custom binary encoding controlled and updated via code instead of config would be next.

> To be fair you would gzip the JCOF encoding in this example too.

Just tested my 84M fake social media file - `jcof` gives 44M, `gzip` gives 19M, `jcof+gzip` gives 17M. In essence, you've gained 2M for two CPU intensive procedures instead of one. Doesn't seem all that worth it?

10% is a lot of your egress.
A fair point.

Prompted me to check if the higher zstd levels worked any better on my 84MB fake social graph - nope - and then if LZMA was any good - yes, `lzma` at 5 or higher on the raw JSON beats `jcof | lzma` by ~2M every time. `lzma -4` beats it by ~400k.

If I sort my object keys (a la `jq -S`), `lzma` beats `jcof|lzma` at every level (`gzip` never gets close, `zstd` gets closer.)

EDIT: Nope, I was wrong, I was doing `lzma` against `jcof|zstd`. `jcof|lzma` is still sneaking ~1M below `lzma` at all levels.
meteorites.json, re-encoded via both JSON.stringify() and jcof.stringify():

             json   jcof  jcof/json
    plain  244975  87083      0.355
  gzip -6   35829  33152      0.925
  gzip -9   34384  32875      0.956
    xz -9   27864  28696      1.030
And I imagine this is close to ideal for jcof. So unless that last few % really matters, gzipped JSON is probably much better in the general case.