| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by faleidel 1477 days ago
	I would like to see gzip compression added to the benchmark

1 comments

mg 1477 days ago

    curl https://raw.githubusercontent.com/mortie/jcof/main/tests/corpus/meteorites.json | gzip -9 | wc

    Gives me 34569

So the comparison is:

    JSON: 244920 bytes
    JCOF: 87028 bytes
    GZIP: 34569 bytes

link

eof 1477 days ago

To be fair you would gzip the JCOF encoding in this example too.

Author mentions gzip doesn’t work for some use cases. For use case mentioned I’d expect sqlite to be similar, at least that is the default thing I’d reach for.

If for some reason sqlite wasn’t sufficient probably a custom binary encoding controlled and updated via code instead of config would be next.

link

zimpenfish 1477 days ago

> To be fair you would gzip the JCOF encoding in this example too.

Just tested my 84M fake social media file - `jcof` gives 44M, `gzip` gives 19M, `jcof+gzip` gives 17M. In essence, you've gained 2M for two CPU intensive procedures instead of one. Doesn't seem all that worth it?

link

Beltiras 1477 days ago

10% is a lot of your egress.

link

zimpenfish 1477 days ago

A fair point.

Prompted me to check if the higher zstd levels worked any better on my 84MB fake social graph - nope - and then if LZMA was any good - yes, `lzma` at 5 or higher on the raw JSON beats `jcof | lzma` by ~2M every time. `lzma -4` beats it by ~400k.

If I sort my object keys (a la `jq -S`), `lzma` beats `jcof|lzma` at every level (`gzip` never gets close, `zstd` gets closer.)

link

zimpenfish 1477 days ago

EDIT: Nope, I was wrong, I was doing `lzma` against `jcof|zstd`. `jcof|lzma` is still sneaking ~1M below `lzma` at all levels.

link

jffry 1477 days ago

meteorites.json, re-encoded via both JSON.stringify() and jcof.stringify():

             json   jcof  jcof/json
    plain  244975  87083      0.355
  gzip -6   35829  33152      0.925
  gzip -9   34384  32875      0.956
    xz -9   27864  28696      1.030

And I imagine this is close to ideal for jcof. So unless that last few % really matters, gzipped JSON is probably much better in the general case.

link