Hacker News new | ask | show | jobs
by oavdeev 4099 days ago
Multiple outputs are tricky, what you going to do if you run a step and it somehow produces 2 out of 3 expected outputs? I mean, you need a way to decide whether whole thing failed or not, easiest way is indeed to have dummy targets.

Anyway, take a look at https://github.com/spotify/luigi, it is basically make-like tool geared toward data pipelines. We are experimenting with it for our data pipeline (which is quite sizeable, ~30TB/day gzipped, thousands of files, a bunch of processing steps) and while it lacks some things, especially in the UI, this approach seems to work quite well.

1 comments

Why would you not want to consider the step failed if it didn't produce all its expected targets?
Yeah, that's my point: if you're happy with this all-or-nothing logic, you can just use dummy target files.

Allowing steps to partially succeed seems like adding quite a lot of complexity, and I'm not sure where it would be beneficial.

Dummy target files don't work if the files are deleted elsewhere --- i.e. not as part of the make pipeline.