|
|
|
|
|
by davidatbu
1349 days ago
|
|
How do you merge multiple versions of data using tensorboard? Or what other tool handles that for you? What's the case for handling code and data separately? In my experience, the primary motivation for using such a tool are easy reproducibility through easy tracking of code, hyperparams, and data. It's not obvious to me how that goal would be advanced by tracking code and data separately. |
|
Handling code and data separately is important, to allow easy updates to one or the other. They are loosely coupled to allow quicker updates, rather than having to increment versions on both as per DVC, and DVC is far heavier weight as it pulls the data referenced in the dvc files, and you have to pick out on the CLI which ones you want.
Downloading as required to a local cache when needed from your actual scripts works much better. It's just like what transformers does for pre-trained models.