Hacker News new | ask | show | jobs
by philsnow 727 days ago
Have you pointed borgbackup or similar at it? i.e. extract the archive to a specific directory, let borg create an archive of it, and then a month later do the sage thing and see if the incremental size is egregiously large? I would expect the overwhelming bulk of data to be media, and those will consume (nearly) zero incremental space with borgbackup or some other deduplicating backup system.
1 comments

I don't know what the point would be? You still have to perform the entire Takeout, on a disk that already has a previous Takeout, so you always need double the space, and you always need to spend days (?) downloading terabytes (?) of data.

Once you've downloaded the entire new Takeout, there's no reason to deduplicate -- just delete the old Takeout.

Ah right, it would still take a long time to download.

My use case is a have a local NAS that i use for backup but i also want things backed up offsite, so i mirror the backups to b2 (and soon to glacier).

I would download and extract the takeout archive locally, then run borg with the NAS as the borg repo. It tries to dedup and only store incremental data in the Borg repo.

If the takeout data consistently has enough of the same “shape”, the b2/s3 storage would only grow by roughly my incremental takeout archive size, rather than storing 200 more GB every time I export a takeout.

So yah, it would use a lot of space locally and temporarily, but the idea for me is to minimize cloud storage but also being able to extract files from older takeout archives.

The reason for deduplication and incremental backups is that you can recover accidentally deleted photos.

You don't need to keep the previous backup on the disk, it's enough to have it on the backup destination (at least in the case of borg).