Hacker News new | ask | show | jobs
by mcncfie 2688 days ago
Thanks! Regarding 2, could you give an example?

Also, can I combine DVC with a pipeline tool like Apache Airflow?

1 comments

Example. A query to DB gives you different results since the data\table evolves over time. So, you just store the query output (let say a couple GBs) in DVC to make your research reproducible.

This is like assigning a random-seed to DB :)

Sure, some teams combine DVC with AirFlow. It gives a clear separation between engineering (reliability) and data science (lightweight and quick iteration). A recent discussion about this: https://twitter.com/FullStackML/status/1091840829683990528