Hacker News new | ask | show | jobs
by Too 1217 days ago
People normally say don’t store binaries in git. Is this a big issue if the files don’t change very often? From what I understood the biggest problem is they don’t diff well. With photos not changing very often, can it work?

Anyone tried using git for 500G of photos?

I would love to if it worked, I have my photo collection spread out on multiple computers and merging the edits to the master backup is always a pita. “Was this file removed from copy A or added to copy B”? All those problems just solve themselves with a clear DVCS git history.

Not having the possibility of ever removing photos, to free up space, is of course another issue of git.

3 comments

Git simply wasn't designed for that and so the key issue with storing binaries in it is what you mentioned last - that the way git works, a full clone has the full history of all the files. Deleting a file in git then doesn't actually delete the file from git history, so a fresh full clone of 500G of photos isn't going to be 500G, it's going to be that, times however many copies exist in history. A shallow clone solves that, and shallow clones supposedly work better these days in latest version of git, but fundamentally you're using a hammer on screws, as it were.

If you're open to new tools, git annex is what you're looking for. The other two options are Subversion, which has some DVCS features these days, or Perforce Helix Core (paid), though I can't vouch for it as I've never used it.

We've been working on some open source tooling called "oxen" that was built for large datasets of images, video, audio, text etc. We wanted to solve the exact problem you're flagging here with git.

Feel free to check it out here https://github.com/Oxen-AI/oxen-release#-oxen would love any feedback!

I guess that 500GB repository would be barely usable.

You should check out Git LFS if you want to do that, as it sounds like a good idea in the first place!