Hacker News new | ask | show | jobs
by jdoliner 1352 days ago
DVC is great for use cases that don't get to this scale or have these needs. And the issues here are non-trivial to solve. I've spent a lot of time figuring out how to solve them in Pachyderm which is good for use cases where you do need higher levels of scale or might run into merge conflicts with DVC. There's trade-offs though. DVC is definitely easier for a single developer / data scientist to get up and running with.
1 comments

I think it's worth noting that DVC can be used to track artifacts that have been generated by other tools. For example, you could use MLFlow to run several model experiments, but at the end track the artifacts with DVC. Personally I think that this is the best way to use it.

However I agree that in general it's best for smaller projects and use cases. for example, it still shares the primary deficiency of Make in that it can only track files on the file system, and now things like ensuring a database table has been created (unless you 'touch' your own sentinel files).