Hacker News new | ask | show | jobs
by remram 1352 days ago
Yes it definitely serves a valid use-case, I feel like someone should try and bring some competition there. A modern equivalent with fewer gotchas, maybe in Rust/Go, maybe using a fuse mount and content-defined chunking (borg/restic/...-style) would be amazing.
2 comments

I'd love to see a well-supported git-lfs compatible client/proxy (so you could more easily move backends) that could run on top of S3/object storage. Yes, and written in a modern language like golang/rust for performance / parallelism. There's some node.js and various other git-lfs proxies out there, but not well enough maintained that I could count on them being around and working in another 5 years. git-annex at least has been around for a while, even though it has its issues.

Huggingface uses git-lfs for large datasets with good success. git-lfs on GitHub gets very pricey at higher volumes of data. Would love the affordability of object storage, just with a better git blob storage interface, that will be around in the future.

Most of these systems do their own hash calculations and are not interchangeable with each other. I feel like git-lfs has the momentum at the momentum in data-science at the moment, but needs some better options for people who want a low cost storage option that they can control.

Huggingface is great, but it's one more service to onboard if you're in an enterprise. And data privacy/retention/governance means that many people would liek their data to reside on their own infrastructure.

If AWS were to give us a low cost git-lfs hosted service on top of S3 it would be very popular.

If anyone knows of some good alternatives, please let us know!

Did some more research to see if anything had changed in this space. I found two interesting projects (haven't used them myself yet though):

One in C# (with support for auth)

https://github.com/alanedwardes/Estranged.Lfs

One in Rust (but no Auth, have to run reverse proxy)

https://github.com/jasonwhite/rudolfs

Both seem interesting. Anyone use these?

I work with a lot of uncompressed structured binary files so I finally broke down and wrote my own system based on the Restic chunker: https://github.com/akbarnes/dupver It's pretty basic, but it works for me and will hopefully inspire someone to make a "real" data VCS based on content-defined chunking.