| I can understand in theory why they wouldn't want to back up .git folders as-is. Git has a serious object count bloat problem if you have any repository with a good amount of commit history, which causes a lot of unnecessary overhead in just scanning the folder for files alone. I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat. At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever. The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be. |
Ironically, I believe you have it backwards: pack files, git's solution to the "too many tiny files" problem, are the issue here; not the tiny files themselves.
In my experience, incremental backup software works best with many small files that never change. Scanning is usually just a matter of checking modification times and moving on. This isn't fast, but it's fast enough for backups and can be optimized by monitoring for file changes in a long-running daemon.
However, lots of mostly identical files ARE an issue for filesystems as they tend to waste a lot of space. Git solves this issue by packing these small objects into larger pack files, then compressing them.
Unfortunately, it's those pack files that cause issues for backup software: any time git "garbage collects" and creates new pack files, it ends up deleting and creating a bunch of large files filled with what looks like random data (due to compression). Constantly creating/deleting large files filled with random data wreaks havoc on incremental/deduplicating backup systems.