Hacker News new | ask | show | jobs
by pufuwozu 5802 days ago
Isn't Pigz a parallel implementation of gzip which is itself based upon LZ77?

http://www.zlib.net/pigz/

Hasn't there been a few parallel implementations of bzip2?

http://compression.ca/pbzip2/

http://bzip2smp.sourceforge.net/

I understand that parallelism isn't suited for a lot of algorithms but from what I can tell, there's been heaps of successful work in compression. Could you give us some more information?

It would really help me out - I'm doing an undergraduate class where I have to manually make an open-source project parallel. I've been looking into compression algorithms because I thought they were well suited. Please help me out if I'm going down the wrong path!

2 comments

The key to malkia's strawman is that pigz does not give the same results as the sequential "c" version of gzip. AFAIK gzip is not parallelizable as-is because every symbol depends on the previous symbol; pigz breaks this dependency to get parallelism but gives up a little compression efficiency. In practice this does not matter, which is why LZ is not a good example.
Compression algorithms are poorly suited to parallelism, because they remove everything that isn't a data dependency in the input, and parallelism is nothing but a lack of data dependencies.

The trick is to start at the largest chunk possible and go down until you find where they have left in some, uh, non-dependencies - like bzip2 which has independent x*100KB blocks, and video which (usually) has independent frames. You should be able to get 2-4 separate tasks out of that, which is good enough for CPUs.