|
|
|
|
|
by smeagull
1358 days ago
|
|
I don't think this tool can encompass everything you need in managing ML models and data sets, even if you limit it to versioning data. I'd need such a tool to manage features, checkpoints and labels. This doesn't do any of that. Nor does it really handle merging multiple versions of data. And I'd really like the code to be handled separately from the data. Git is not the place to do this. Because the choice of picking pairs of code and data should happen at a higher level, and be tracked along with the results - that's not going in a repo - MLFlow or Tensorboard handles it better. |
|
What's the case for handling code and data separately? In my experience, the primary motivation for using such a tool are easy reproducibility through easy tracking of code, hyperparams, and data. It's not obvious to me how that goal would be advanced by tracking code and data separately.