|
|
|
|
|
by oavdeev
4099 days ago
|
|
Multiple outputs are tricky, what you going to do if you run a step and it somehow produces 2 out of 3 expected outputs? I mean, you need a way to decide whether whole thing failed or not, easiest way is indeed to have dummy targets. Anyway, take a look at https://github.com/spotify/luigi, it is basically make-like tool geared toward data pipelines. We are experimenting with it for our data pipeline (which is quite sizeable, ~30TB/day gzipped, thousands of files, a bunch of processing steps) and while it lacks some things, especially in the UI, this approach seems to work quite well. |
|