Hacker News new | ask | show | jobs
by bs7280 1350 days ago
What value does this provide that I can't get by versioning my data in partitioned parquet files on s3?
1 comments

I think parquet won't help with images, video, ML models.

Also, one thing is to physically provide a way to version data (e.g. partitioned parquet files, cloud versioning, etc, etc), but another one is to also have a mechanism of saving / codifying dataset version into the project. E.g. to answer the question which version of data this model was built with you would need to save some identifier / hash / list of files that were used. DVC takes care of that part as well.

(it has mechanics to cache data that you download, make-file like pipelines, etc)