| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by banga 1210 days ago
	How does this compare with other systems, like DVC (https://dvc.org/) for example?

3 comments

gschoeni 1209 days ago

Raw speed on large datasets of images, video, audio, etc is one factor, some performance numbers can be found here:

https://github.com/Oxen-AI/oxen-release/blob/main/Performanc...

link

barcoded 1203 days ago

Looks like in those benchmarks Oxen.AI makes a misguided assumption that benchmarking DVC is (roughly...?) the same as benchmarking DVC<>DAGShub (server side made by a different company). To my understanding DAGShub is a bottleneck there. They didn't care to benchmark DVC against an S3 bucket or a similar cloud storage that is more widely used. I wonder if it's because DAGShub makes this whole setup wayyy slower

link

gschoeni 1202 days ago

Oxen dev here - let me add some benchmarks for DVC backed by an S3 bucket. I did it awhile back and we were still faster, but agree it's a good benchmark to have.

Fundamentally even adding and committing data locally is slower, even before the push. But I agree the remote matters too.

link

twic 1209 days ago

But what on earth is this measuring?

  oxen push origin main # ~308.98 secs

Where does that push to? Does this benchmark really just measure how well-provisioned various different VC-funded websites currently are?

I think a proper benchmark here would be install the server parts of Oxen, Git-LFS, etc on the same machine, and then time how long it takes to commit and push the same dataset from some other machine.

Although of course given that we live in an age where people expect to upload their immense datasets to the cloud for some reason, a "proper" benchmark might not be a relevant one. I'm not sure what a really good benchmark of that would be.

link

gschoeni 1202 days ago

Will add a local network benchmark as well! Many reasons to upload your data to the cloud...but agree that there are use cases where you might just want to version on your local network.

link

rsfern 1210 days ago

Oxen seems more like git (with GitHub integration (Oxenhub)) for ML datasets, where DVC is a bit bit more like make (with S3, LFS, etc integration) for ML datasets. It seems like Oxen has finer granularity version control and diff capability, but as far as I can tell doesn’t have as many features to track and version derived data along with the code that produced it (like `dvc repro`)

link

gschoeni 1209 days ago

We definitely have some of these features on our roadmap! Anything particularly helpful in DVC's workflow that you think we should prioritize?

link

michaelmior 1209 days ago

One thing I love about DVC is that it doesn't need its own server. I can just push/pull files via SSH. I don't really want one more service that I need to keep running. I also happen to have a lot of space available to me on a server I can't install extra services on, so oxen requiring that is a deal breaker for me.

link

bagavi 1209 days ago

This is the real deal breaker for me. Dvc is super slow but it works with S3 (one of the greatest technologies built in last 15 years). At our company, we've written own (10x) faster version of dvc for commonly used features.

link

gschoeni 1209 days ago

We have working with an S3 backend in the upcoming features, agree it's essential.

link

gschoeni 1209 days ago

Good feedback, we're working on more streaming features as well as supporting different backends for the CLI.

Any other features you would find useful or a dealbreaker?

link

michaelmior 1209 days ago

Perhaps this is outside the scope of what Oxen aims to do, but I like that DVC has a way for me to specify scripts and dependencies and then decide what needs to be regenerated (and what doesn't) when dependencies change.

link

rsfern 1209 days ago

Cool! To be honest I don’t really use dvc much, but the project version control features are what really interest me. I like how data pipelines help align versioned artifacts like model checkpoints and visualizations with the datasets and code that produced. I work as a computational science and that sort of reproducibility tool is really important, and a lot of us don’t have the best software engineering skills/discipline.

From your readme it seems like the oxen repo and software project repo are not as closely coupled as in dvc? It seemed like in the current state of oxen, you could do something similar with make files and oxen tracking?

Oxen seems really good for longer lived data and computational science projects, where dvc seems more oriented just at analysis projects. I have a project that I want to try it out on :)

link

bravura 1209 days ago

On the topic of dvc, does anyone have any experiences with dagshub (https://dagshub.com/) that they are interested in sharing?

link