Hacker News new | ask | show | jobs
by ThomasWaldmann 1273 days ago
I see quite some full backups in your near future. ;-)

And that is one of the main reasons why chunk-deduplicating backup tools (like borg, restic, ...) are better than full/incremental style ones.

2 comments

Yeah the process is to run a full backup weekly, and keep four weeks of backups. This mimics the old rsnapshot system, but sends the data directly to S3. In this particular case, the total amount of data is only less than 50GB, so the cost of a weekly backup is not great. The goal was to eliminate the dedicated backup server which ran pull backups via rsync/rsnapshot. Having each host back itself up to S3 directly is cheaper than paying for a backup host (including the time spent maintaining the backup host itself). As far as I understand, borgbackup requires a dedicated backup server.
Is there really a difference between deduplication and an infinite chain of incrementals, in this respect? Other than the restore process?
deduplicating backup tools use a symmetric approach: all archives needing some chunk (piece of data) reference it via its chunkid (the keyed hash over the chunk plaintext content).

so you can delete ANY backup archive without influencing any other backup archive. a chunk will be only deleted if nothing is referencing it any more.

also, each backup is logically a FULL backup (it has ALL files, references ALL content data). it is just made in a clever way, avoiding to re-transfer data that already is present in the backup repository, thus it FEELS like incremental (considering speed, amount of CPU and I/O used).

OTOH, full/incremental style backup tools build a chain of incremental archives depending on the previous incremental and the full backup, which gets more fragile the longer the chain gets.

because of that and also because you might want to delete older backups at some time, you are forced to create new full backups regularly (causing lots of CPU and I/O load).