Example. A query to DB gives you different results since the data\table evolves over time. So, you just store the query output (let say a couple GBs) in DVC to make your research reproducible.
This is like assigning a random-seed to DB :)
Sure, some teams combine DVC with AirFlow. It gives a clear separation between engineering (reliability) and data science (lightweight and quick iteration). A recent discussion about this: https://twitter.com/FullStackML/status/1091840829683990528
This is like assigning a random-seed to DB :)
Sure, some teams combine DVC with AirFlow. It gives a clear separation between engineering (reliability) and data science (lightweight and quick iteration). A recent discussion about this: https://twitter.com/FullStackML/status/1091840829683990528