Hacker News new | ask | show | jobs
by noss 5795 days ago
For me, this article introduces some useful terminology in relation to concurrency that I had not read about before. I am thinking about "SPMD, Single Program Multiple Data" and data-driven vs control-driven parallelism.

As an Erlang user and forum/chat hang-around, I frequently see people come in with a problem that is very data-driven. Large matrix inversions, AI back-tracking, then dismiss Erlang as academic hot-air because spawning an Erlang process per element and taking care of calculation results was too much work and no speedup. Obviously they were having data-driven problems and Erlang is made for control-driven problems.

I would love to find well-written and succinct articles on the difference between these two kinds of problem domains to refer people to. I find that it is very difficult to describe this well myself.

1 comments

I'm still not sure I understand the difference between data- and control-driven parallelism, primarily because I don't have a concrete example of the latter. Also, he distinguishes between threads and tasks, but I'm not really sure what the difference is... are the latter simply not preemptable?
The goal, the scarce resource one optimize for, tend to be different. (To use the language from http://freakonomics.blogs.nytimes.com/2010/07/30/know-your-s..., posted here earlier ).

Control-driven concurrency tends to be io-bound if being bound at all, and the major issue tends to be dealing with the complexity of the rules. I.e. if-that-happens then-start-that-thing which-tells-that-system and-then-continues-with-that-task which-notices-the-registered-plugins. These rules should not block for other concurrent but independent sessions. Scalability tends to be an issue here. A relevant scalability example for Erlang would be a telecom system where each machine-added make you able to handle X number of more subscribers.

Data-driven concurrency tends to be more about making use of the concurrency available in the current hardware generation to speed up very cpu-bound computations. Using more threads than cores available just mean more context-switches and less performance. These are problems that likely also want to make use of SIMD instructions.