Hacker News new | ask | show | jobs
by rsfern 1211 days ago
Oxen seems more like git (with GitHub integration (Oxenhub)) for ML datasets, where DVC is a bit bit more like make (with S3, LFS, etc integration) for ML datasets. It seems like Oxen has finer granularity version control and diff capability, but as far as I can tell doesn’t have as many features to track and version derived data along with the code that produced it (like `dvc repro`)
1 comments

We definitely have some of these features on our roadmap! Anything particularly helpful in DVC's workflow that you think we should prioritize?
One thing I love about DVC is that it doesn't need its own server. I can just push/pull files via SSH. I don't really want one more service that I need to keep running. I also happen to have a lot of space available to me on a server I can't install extra services on, so oxen requiring that is a deal breaker for me.
This is the real deal breaker for me. Dvc is super slow but it works with S3 (one of the greatest technologies built in last 15 years). At our company, we've written own (10x) faster version of dvc for commonly used features.
We have working with an S3 backend in the upcoming features, agree it's essential.
Good feedback, we're working on more streaming features as well as supporting different backends for the CLI.

Any other features you would find useful or a dealbreaker?

Perhaps this is outside the scope of what Oxen aims to do, but I like that DVC has a way for me to specify scripts and dependencies and then decide what needs to be regenerated (and what doesn't) when dependencies change.
Cool! To be honest I don’t really use dvc much, but the project version control features are what really interest me. I like how data pipelines help align versioned artifacts like model checkpoints and visualizations with the datasets and code that produced. I work as a computational science and that sort of reproducibility tool is really important, and a lot of us don’t have the best software engineering skills/discipline.

From your readme it seems like the oxen repo and software project repo are not as closely coupled as in dvc? It seemed like in the current state of oxen, you could do something similar with make files and oxen tracking?

Oxen seems really good for longer lived data and computational science projects, where dvc seems more oriented just at analysis projects. I have a project that I want to try it out on :)