Hacker News new | ask | show | jobs
by smeagull 1358 days ago
I don't think this tool can encompass everything you need in managing ML models and data sets, even if you limit it to versioning data.

I'd need such a tool to manage features, checkpoints and labels. This doesn't do any of that. Nor does it really handle merging multiple versions of data.

And I'd really like the code to be handled separately from the data. Git is not the place to do this. Because the choice of picking pairs of code and data should happen at a higher level, and be tracked along with the results - that's not going in a repo - MLFlow or Tensorboard handles it better.

1 comments

How do you merge multiple versions of data using tensorboard? Or what other tool handles that for you?

What's the case for handling code and data separately? In my experience, the primary motivation for using such a tool are easy reproducibility through easy tracking of code, hyperparams, and data. It's not obvious to me how that goal would be advanced by tracking code and data separately.

Tensorboard doesn't do that, I was referring to things a dataset/model management tool should do. For us, Tensorboard tracks the datasets as hyperparams. The actual multiple versions of data end up being handled on the warehouse side. Prefect is what we use for running those DAGs to make the different versions.

Handling code and data separately is important, to allow easy updates to one or the other. They are loosely coupled to allow quicker updates, rather than having to increment versions on both as per DVC, and DVC is far heavier weight as it pulls the data referenced in the dvc files, and you have to pick out on the CLI which ones you want.

Downloading as required to a local cache when needed from your actual scripts works much better. It's just like what transformers does for pre-trained models.

I forgot to say thanks regarding this!

> Tensorboard tracks the datasets as hyperparams.

Clever!

> Warehouse side .. Prefect

I'll have to checkout warehouse-side things and Prefect to see what you mean.

Appreciate all the pointers!