Hacker News new | ask | show | jobs
by shcheklein 1349 days ago
I think parquet won't help with images, video, ML models.

Also, one thing is to physically provide a way to version data (e.g. partitioned parquet files, cloud versioning, etc, etc), but another one is to also have a mechanism of saving / codifying dataset version into the project. E.g. to answer the question which version of data this model was built with you would need to save some identifier / hash / list of files that were used. DVC takes care of that part as well.

(it has mechanics to cache data that you download, make-file like pipelines, etc)