Yeah but I wonder about is, is the concurrency truly fine-grained and scalable? For example, you want to do a map over an array of 1 million elements. Can you dump a million blocks on a queue for a linear speed-up?
You could dump a million blocks in the queue, but that's probably too fine-grained. Each block is an allocation and there's some amount of overhead to executing blocks (though I've never measured how much). So unless each element actually takes a long time to process, you probably want to batch it up into larger segments. Dispatch has an API explicitly meant for this scenario (doing concurrent work over an array of elements), which is dispatch_apply() in C or DispatchQueue.concurrentPerform(iterations:execute:) in Swift, though the block is just given an index and it has to do the legwork of reading the correct elements from the input.