Hacker News new | ask | show | jobs
by dschoon 5834 days ago
I built and tried this out. I'm rather shocked to see speed and compression ratio better than gzip -9.

The results of my unscientific test (compression only):

   Compressor  Size  Ratio  Time
   gzip -1     23MB  88%     1.18s
   gzip -2     23MB  87%     1.38s
   bzip2       23MB  87%     5.57s
   xz -1       23MB  87%     5.35s
   xz -9       11MB  43%    10.58s
   bmz         13MB  45%     0.95s
1 comments

It really depends on the data. For highly redundant data like web pages with lots of boilerplate header/footer, it can compress better because the bm_pack (first pass of bmz) looks for large common patterns over all the input. For typical text, it should be a little worse than gzip but faster.

BMZ = bmpack + lzo by default and can be combined with lzma if necessary. It's not really a BMDiff and Zippy clone, as I've never had a chance to see Google's implementation. It's based on the original Bentley & McIlroy paper: "Data Compression Using Long Common Strings", 1999. Even the two pass idea is from that paper. It was really a wacky experimental implementation (with a lot of room for improvement) to satisfy my curiosity. I'm a little surprised that the 0.1 version has been stable for quite a few people compressing TBs of data through it.