| HN Mirror

Tensorboard doesn't do that, I was referring to things a dataset/model management tool should do. For us, Tensorboard tracks the datasets as hyperparams. The actual multiple versions of data end up being handled on the warehouse side. Prefect is what we use for running those DAGs to make the different versions.

Handling code and data separately is important, to allow easy updates to one or the other. They are loosely coupled to allow quicker updates, rather than having to increment versions on both as per DVC, and DVC is far heavier weight as it pulls the data referenced in the dvc files, and you have to pick out on the CLI which ones you want.

Downloading as required to a local cache when needed from your actual scripts works much better. It's just like what transformers does for pre-trained models.