| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by RootDynasty 4083 days ago

Here is another cool version of parallel map in Erlang that uses a fold instead of two parallel maps.

  parallel_map(Fun, List) ->
      Last = lists:foldl(fun(Value, Parent) ->
          spawn(fun() ->
              MappedValue = Fun(Value),
              receive
                  Rest -> Parent ! [MappedValue|Rest]
              end
          end)
      end, self(), List),
      Last ! [],
      receive
          Result -> Result
      end.

It essentially sets up a chain of processes which pass their results to their parent when they finish their mapping operation.

1 comments

mononcqc 4083 days ago

This implementation does rely on a lot of memory copying going on there. Rather than copying the original element to the new process, then the new mapped element back to the original process (giving you the original list's worth of memory copied twice), you'll be copying each list element once to every worker you spawn, and then you'll copy the parts of the list worked on to the next worker, on and on, until the full list is finally copied back to the caller.

So instead of copying 2n elements over the processes, you'll be copying roughly 2nlogn (or is it n(n+1)/2 ?) elements around.

link

RootDynasty 4083 days ago

One thing to note is that the the parallel map consisting of two mapping operations has a different hidden overhead. If a message is received by a process that does not pattern match with any clauses in the receive block, the message is stored in a queue. When a new receive block is entered, all messages in the queue are pattern matched against the new receive block. In the worst case scenario, the worker process mapping the elements will finish in list order, so a great many pattern matches will be tried.

The solution to this problem is just to store messages as they are received inside of a map data structure, so that there is no overhead for receives. This requires indexing the list which makes the code a lot more inelegant.

Edit: Given n processors and an input of size n, I believe the time complexities are: Two map solution: O(n^2)+O(f) Fold solution: O(n^2)+O(f) Two map with map data structure: O(n)+O(f)

Where O(f) is the asymptotic upper bound of the function being mapped

Edit 2: This has me thinking about what will be the most efficient way to assign an element from the list to worker processes. In Erlang, it's clear that there isn't a way to avoid the O(n) overhead since the original process must reconstruct the list in order using cons.

In an imperative language this isn't necessary. Perhaps the original process can recursively assign indicies by spawning two children worker processes which are given a range of indicies to work on (who then create their own assignment processes). I believe the overhead is then only O(log n)...

link

lostcolony 4083 days ago

Just to comment, you're worrying about big O time complexities and that...really isn't what is going to dominate anything that is sufficiently complex to warrant a pmap rather than just a map. The constants on your computations will almost assuredly dwarf it.

The only O(n) operations are, yes, a completely degenerate case of when messages get sent back (which is -incredibly- unlikely; with Erlang's task scheduling you're likely only going to ever have a max message queue length of a few items, so it's more likely a constant factor. To get the degenerate case you would need it to finish in -reverse- list order, that is, would need to finish with the last one first, then the next to last one, etc), and when reversing the list(s) built up from the map at the end (as under the covers I'm pretty sure map is written to be tail recursive), which while technically O(n), is still incredibly fast.

link

RootDynasty 4083 days ago

I agree that worrying about the time complexity of the non-parallel portions of pmap is unlikely to be an issue for most use cases. It's still interesting to think about the tradeoffs though.

Hitting the degenerate case depends on the function you're computing in question. It's quite possible that the tasks will complete in the given order. I think you're giving too much credit to Erlang's task scheduler.

Also I'm not sure how one would even implement a tail recursive map function on a singly linked list. The cons operation can only add elements to the front of the list. I looked up how the map operation is implemented in Erlang. It isn't tail recursive:

https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7...

I'm interested to know how you'd implement a tail-recursive version of map (continuations aren't allowed).

link

lostcolony 4082 days ago

Interesting that it's not; I would have expected it to be. Given that, though, it's spawning tasks in list order, then receiving in list order; if the tasks complete in the order you spawned them, the first thing in the queue/arriving is always the item you're receiving on. That's the ideal case; more likely they'd be nearly in the order you spawned them, in which case you'd only have a few items to check through before you found the one you're receiving on.

I'm not saying the task scheduler is perfect, but I'd be really, really weirded out if it gave priority to the final process spawned, and worked its way backwards, which would be necessary for the degenerate case (that is, we spawned off items 1,2,3,4 in that order, but they completed 4,3,2,1. I would expect them to finish in close to 1,2,3,4 order, which would leave it at O(1) on each receive).

I'd implement a tail recursive map as -

  map(F, L) -> map(F, L, []).

  map(_, [], Acc) -> lists:reverse(Acc);
  map(F, [H|T], Acc) -> map(F, T, [F(H) | Acc]).

link