Hacker News new | ask | show | jobs
by unqueued 1946 days ago
Have you looked into git-annex?

Git annex lets you track references to binary files, only using git for storing references to file hashes.

And you can use custom back ends to efficiently store differential data.

For example, I have an annex repo that stores about 150G of text files, but it uses bup to compress it down to about 20G, while I can still have access to different versions via git.

1: https://git-annex.branchable.com/special_remotes/bup/

1 comments

Impressive numbers! Unfortunately I know git-annex only on paper. I gave it a try a while ago, but it was a bumpy start, admittedly most likely user-error. Would you mind sharing some details about it (e.g. file numbers, etc)? Can I invite you for a chat? Doesn't need to be long, but might be more suitable for a chat
Sure, how would you like to get in touch? You have a discord, right? I actually was looking at your project and was thinking of opening a simple PR. (same username)

I have some more examples git-annex repos:

This is an annex repo I made of this popular abandonware website:

https://github.com/unqueued/repo.macintoshgarden.org-fileset

And some podcasts

https://github.com/unqueued/radiolab-fileset

https://github.com/unqueued/ratholeradio-archive

What's cool is that people can use standard pull requests to add files to the repo. And the repo itself is small, but it can represent huge filesets. Datalad has some really fascinating medical imaging data repos that are massive (https://www.datalad.org/datasets.html).

If you wanna see a really good example of a repo with versioned binary files, check this out the git annex repo of previous git-annex binary releases:

https://downloads.kitenet.net/.git/

You can just use standard git workflows to see previous revisions of a file (well, previous hashes) but it is really easy to hook into.

Very excited for a PR. Any help and support is very welcome. :-)

I just cloned one of the repos, seems I really should look more into annex. Feel free to join the Discord channel, that would be the easiest to go from there