| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by malkia 5802 days ago

Let's face it - it's not the language that is the problem. It would help you, but it would help you only with some percentage of the problems out there.

For example - there does not exist a practical language (or design) where you can implement the LZ compression (ZLIB, others) in parallel, so that it gives the same results as the sequential "c" version.

It's just that certain algorithms, hence protocols, data structures, standards are not suited for parallel processing that well.

Okay, in the first case, maybe you can split the incoming data by 128kb and process each other individually ... but that's not the same - you can't reuse the LZ window.

Really the problem is the 13 dwarfs that university folks have identified - 13 stereotypical problems that relate to 99% of what's being done with a programming language - some of the dwarfs are just speedy parallel gnomes, some of them are old slow stubborn, like Gimli from LOTR.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-18...

3 comments

pufuwozu 5802 days ago

Isn't Pigz a parallel implementation of gzip which is itself based upon LZ77?

http://www.zlib.net/pigz/

Hasn't there been a few parallel implementations of bzip2?

http://compression.ca/pbzip2/

http://bzip2smp.sourceforge.net/

I understand that parallelism isn't suited for a lot of algorithms but from what I can tell, there's been heaps of successful work in compression. Could you give us some more information?

It would really help me out - I'm doing an undergraduate class where I have to manually make an open-source project parallel. I've been looking into compression algorithms because I thought they were well suited. Please help me out if I'm going down the wrong path!

link

wmf 5802 days ago

The key to malkia's strawman is that pigz does not give the same results as the sequential "c" version of gzip. AFAIK gzip is not parallelizable as-is because every symbol depends on the previous symbol; pigz breaks this dependency to get parallelism but gives up a little compression efficiency. In practice this does not matter, which is why LZ is not a good example.

link

astrange 5802 days ago

Compression algorithms are poorly suited to parallelism, because they remove everything that isn't a data dependency in the input, and parallelism is nothing but a lack of data dependencies.

The trick is to start at the largest chunk possible and go down until you find where they have left in some, uh, non-dependencies - like bzip2 which has independent x*100KB blocks, and video which (usually) has independent frames. You should be able to get 2-4 separate tasks out of that, which is good enough for CPUs.

link

kanak 5802 days ago

> Let's face it - it's not the language that is the problem. It would help you, but it would help you only with some percentage of the problems out there.

Let's not underestimate the help that a language can provide; writing interpreters for languages is almost trivial on a lisp because you can reuse pretty much every piece of machinery on a lisp for your ends. Similarly, there is an entire class of problems that is nearly trivial on prolog that is pretty difficult to get right on other languages simply because prolog makes it easy to express rules and specifications that need to be met. Just look at an implementation of a sudoku solver in prolog and compare it with some other language.

I feel that a language designed with concurrency in mind would make it much simpler to write an entire class of problems. These languages are just gaining traction, so we are yet to see bigger and more significant examples. However, the "ants.clj" demo that Rich Hickey has written in Clojure, and some of the erlang demos in Joe Armstrong's book have made me a believer.

link

wmf 5802 days ago

certain algorithms, hence protocols, data structures, standards are not suited for parallel processing that well.

Sure, but for problems that can be parallelized you want a language to be able to express that parallelism. That may sound obvious, but many popular languages cannot do it. Let's not just give up on parallelism because it can't be used everywhere.

link