|
|
|
|
|
by dmpetrov
2688 days ago
|
|
Example. A query to DB gives you different results since the data\table evolves over time. So, you just store the query output (let say a couple GBs) in DVC to make your research reproducible. This is like assigning a random-seed to DB :) Sure, some teams combine DVC with AirFlow. It gives a clear separation between engineering (reliability) and data science (lightweight and quick iteration). A recent discussion about this: https://twitter.com/FullStackML/status/1091840829683990528 |
|