Hacker News new | ask | show | jobs
by dwhitena 3328 days ago
Reproducibility and provenance are essential for the data science workflows. And it's important to maintain these at scale. Take a look at Pachyderm some time as well if you get a chance (I work on the project for full disclosure). We version data in addition to code for full reproducibility. Even for distributed multi-stage and multi-language pipelines.