Looks like in those benchmarks Oxen.AI makes a misguided assumption that benchmarking DVC is (roughly...?) the same as benchmarking DVC<>DAGShub (server side made by a different company). To my understanding DAGShub is a bottleneck there. They didn't care to benchmark DVC against an S3 bucket or a similar cloud storage that is more widely used. I wonder if it's because DAGShub makes this whole setup wayyy slower
Oxen dev here - let me add some benchmarks for DVC backed by an S3 bucket. I did it awhile back and we were still faster, but agree it's a good benchmark to have.
Fundamentally even adding and committing data locally is slower, even before the push. But I agree the remote matters too.
Where does that push to? Does this benchmark really just measure how well-provisioned various different VC-funded websites currently are?
I think a proper benchmark here would be install the server parts of Oxen, Git-LFS, etc on the same machine, and then time how long it takes to commit and push the same dataset from some other machine.
Although of course given that we live in an age where people expect to upload their immense datasets to the cloud for some reason, a "proper" benchmark might not be a relevant one. I'm not sure what a really good benchmark of that would be.
Will add a local network benchmark as well! Many reasons to upload your data to the cloud...but agree that there are use cases where you might just want to version on your local network.