Hacker News new | ask | show | jobs
by vicaya 5833 days ago
It really depends on the data. For highly redundant data like web pages with lots of boilerplate header/footer, it can compress better because the bm_pack (first pass of bmz) looks for large common patterns over all the input. For typical text, it should be a little worse than gzip but faster.

BMZ = bmpack + lzo by default and can be combined with lzma if necessary. It's not really a BMDiff and Zippy clone, as I've never had a chance to see Google's implementation. It's based on the original Bentley & McIlroy paper: "Data Compression Using Long Common Strings", 1999. Even the two pass idea is from that paper. It was really a wacky experimental implementation (with a lot of room for improvement) to satisfy my curiosity. I'm a little surprised that the 0.1 version has been stable for quite a few people compressing TBs of data through it.