| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zippie 5124 days ago

Would like to emphasize that this is only really useful in environments where gzip is not available (as the OP notes)...some tests using the demo JSON (minified):

test.json = 285 bytes

test.rjson = 233 bytes (18%)

test.json.gz = 205 bytes (27%)

If you are able to bundle a RJSON parser, why not just bundle an existing, well understood/tested compression scheme such as http://stuartk.com/jszip/ or https://github.com/olle/lz77-kit/blob/master/src/main/js/lz7... instead?

2 comments

ZenPsycho 5124 days ago

An arithmetic coding scheme which has a model based on the probabilities found in JSON abstract syntax trees would significantly improve on most typically used generic compression schemes. Arithmetic coding schemes have largely been avoided thus far due to patents which have recently expired, if I remember correctly.

using the order 2 precise model on this page I get 190 bytes-- and that is still a generic non-json model. http://nerget.com/compression/

link

zippie 5124 days ago

This - JSON specific compression schemes aren't going to yield gains over AST friendly schemes unless the JSON serialization specification changes significantly.

Along these lines - shipping a schema with the data payload is avro-like ... which is also questionable in terms of efficiency when compared with gzip/LZO.

link

ZenPsycho 5124 days ago

hey look, I found this http://research.microsoft.com/en-us/projects/jszap/

link

zippie 5124 days ago

They are using gzip compression level 1. Bogus.

link

ZenPsycho 5122 days ago

Are you referring to the graph, in which they set the gzip compression as "1" in order to clearly show the ratio of compression improvement that their technique has over gzip?

link

microtonal 5124 days ago

And if you used gzip on a file, is has some overhead (the 10-byte gzip header) and a freshly initialized deflate state. Usually, compression improves when more data is seen, since the dynamic Huffman tree improves and there are more blocks for LZ77 to backreference.

link