Hacker News new | ask | show | jobs
by roelschroeven 1508 days ago
That discussion doesn't really clear up my confusion though.

I don't understand how bzip3 gets to claim "A better, faster and stronger spiritual successor to BZip2." when even all its own benchmarks show it's slower than bzip2?

2 comments

bzip3 usually operates on bigger block sizes, up to 16 times bigger than bzip2. additionally, bzip3 supports parallel compression/decompression out of the box. for fairness, the benchmarks have been performed using single thread mode, but they aren't quite as fair towards bzip3 itself, as it uses a way bigger block size.

what bzip3 aims to be is a replacement for bzip2 on modern hardware. what used to not be viable decades ago (arithmetic coding, context mixing, SAIS algorithms for BWT construction) became viable nowadays, as CPU Frequencies don't tend to change, while cache and RAM keep getting bigger and faster.

it should be noted that while using 16 times larger block sizes than bzip2 while providing compression ratios up to 10%-50% better at a cost of, as empirically shown, 17 seconds per 1.3GB of data, is a pretty good trade-off and if bzip2 wanted to get anywhere close to that (e.g. using the C API to tweak the block size), it'd have to sacrifice a lot of its performance.

First, I don't understand what the block size has to do with the number of threads used. I understand why one could consider the benchmarks unfair to bzip3 because they're single-threaded (depending on exactly one defines "faster"), but why do you say they are not fair towards bzip3 because of the bigger block size it uses? Do the benchmarks not use the optimal block size for each compression tool?

I can see how bzip3 is better able to exploit the characteristics of modern hardware, but that's not enough to call it faster. The proof of the pudding is in the eating, and if the benchmarks show bzip3 is slower than bzip2 than bzip3 is clearly not faster (but perhaps "aiming to be faster", with some work to be done before it reaches that goal).

Better compression at a slightly slower pace can indeed be a good trade off. I'm not saying bzip3 doesn't work well. What I'm saying is that the "faster" in its description "A better, faster and stronger spiritual successor to BZip2" is not supported by the evidence.

And what does stronger mean? It's not cryptography.

Unless it is

I read it as three quarters of a Daft Punk reference (only).

There is no shortage of compression tools that are better in this or that. At the very least this is a fun engineering exercise, but there is always a massive inertia propblem regarding install base with compression, esp. for data-at-rest. gzip is still used by default in so many contexts, and that's not such a bad thing IMHO.

it's fairly common, at least in the circles i usually dwell in, to call compression ratio "compression _strength_". bzip3 is _better_ than bzip2 since it uses a better technological model as outlined in one of my replies.