Hacker News new | ask | show | jobs
by davidatbu 1349 days ago
How do you merge multiple versions of data using tensorboard? Or what other tool handles that for you?

What's the case for handling code and data separately? In my experience, the primary motivation for using such a tool are easy reproducibility through easy tracking of code, hyperparams, and data. It's not obvious to me how that goal would be advanced by tracking code and data separately.

1 comments

Tensorboard doesn't do that, I was referring to things a dataset/model management tool should do. For us, Tensorboard tracks the datasets as hyperparams. The actual multiple versions of data end up being handled on the warehouse side. Prefect is what we use for running those DAGs to make the different versions.

Handling code and data separately is important, to allow easy updates to one or the other. They are loosely coupled to allow quicker updates, rather than having to increment versions on both as per DVC, and DVC is far heavier weight as it pulls the data referenced in the dvc files, and you have to pick out on the CLI which ones you want.

Downloading as required to a local cache when needed from your actual scripts works much better. It's just like what transformers does for pre-trained models.

I forgot to say thanks regarding this!

> Tensorboard tracks the datasets as hyperparams.

Clever!

> Warehouse side .. Prefect

I'll have to checkout warehouse-side things and Prefect to see what you mean.

Appreciate all the pointers!