|
|
|
|
|
by shcheklein
1349 days ago
|
|
I think parquet won't help with images, video, ML models. Also, one thing is to physically provide a way to version data (e.g. partitioned parquet files, cloud versioning, etc, etc), but another one is to also have a mechanism of saving / codifying dataset version into the project. E.g. to answer the question which version of data this model was built with you would need to save some identifier / hash / list of files that were used. DVC takes care of that part as well. (it has mechanics to cache data that you download, make-file like pipelines, etc) |
|