Just fyi, a library openstack (and myself and others) has created (in python) basically lets u program your workflow (really dataflow) as a DAG, and then the library will run it reliably (and in parallel and so-on). I'm more than willing to answer questions about how it does that if people are interested...