Hacker News new | ask | show | jobs
by Arcuru 383 days ago
48 hours is a crazy amount of time to spend just to compress a git folder, it's only a couple GB. 41 minutes still seems like quite a long time.

Why aren't they just snapshotting and archiving the full git repo? Does `git bundle` add something over frequent ZFS backups?

5 comments

> Be aware that even with these recommendations, syncing in this way has some risk since it bypasses Git’s normal integrity checking for repositories, so having backups is advised. You may also wish to do a git fsck to verify the integrity of your data on the destination system after syncing.

https://git-scm.com/docs/gitfaq#_transfers

It doesn't tell you how to make a backup safely though.

On a personal scale, Syncthing and Btrfs snapshots work plenty good enough. It's as fast as the storage/network too.

Syncthing is the only way I've ever corrupted a git repo before
I think that's why they specified the "BTRFS snapshots" part. Yes, directly syncing a .git directory seems like a recipe for disaster with how often I've seen individual files lagging to sync, but I guess with BTRFS snaphots one can ensure that only a consistent view of a git directory is being backed up and synced.
Nah I truly do it the wrong way around. Syncthing on the git repos. And one of my device in the Syncthing cluster does btrfs snapshots minutely for recovery and further backups.

Because it's at a personal scale, the only time I can corrupt a git repo is if I work on the same repo (and it's workdir) from more than one device in the time it takes for Syncthing to replicate the changes.

But even then it's not a big deal because git fsck is quick. And I have my snapshots, and the syncthing versioning, and git defaults to two weeks before pruning. And because of how git works, using hash to identify contents, files are not easily overwritten either.

In 10y I only had one git corruption (I ran a command on the same repo on a different machine via ssh, yielding a synctning conflict). Syncthing kept copies of the conflict file. One commit disappeared from the history but not from the database. It was easy to rebase the changes. I think I used git fsck to deleted the syncthing versioned files.

If filesystem snapshots weren't safe, wouldn't that also mean git is prone to corrupting your repo in the event of a power loss or crash? That seems like a bad bug.
zfs snapshots are difficult to offsite in non-zfs replicas, say like an S3 bucket.

That said, there's another less known feature that bundles help out with when used with `git clone --bundle-uri` The client can specify a location to a bundle, or the server can send the client the bundle location in the clone results and the client can fetch the bundle, unpack it, and then update the delta via the git server, so it's a lot lighter weight on the server for cloning large repos, and a ton faster for the client for initial clones.

I think if you want consistent snapshot backups on non-zfs destinations the safest thing is to clone the snapshot and rsync from the clone. Not a single-step operation but preserves the atomicity of the snapshot.

EDIT: you could also rsync from a .zfs snapshot directory if you have them enabled.

ZFS can send to file or whatever you want to pipe to, you can have incremental sends, and if you convert to bookmarks on the sender you don't have to keep the historical data after you send it
... so they added caching to things that should have been cached?

... is this really the way people "back up" git repos? I mean, it is git, so isn't there some way to mirror changes to the repo in another repo and just use ZFS / snapshots / backup software / etc to do that? It's a distributed version control system. Just make sure the version control information is ... distributed?

Reading the article I thought exactly the same! I‘d be curious to know how much time the same would take with zfs.
15 years also seems like a long time