Hacker News new | ask | show | jobs
by dom0 3298 days ago
The verdict of the "open source competition" in Duplicacy's README is not entirely accurate. Exclusive locking in the sync'd approach is just the easiest implementation, not the sole possibility. I can't speak for other tools, since I do not know their internals well enough, but I can say about Borg (http://www.borgbackup.org/) that there is no inherent issue in running the important parts of making backups (i.e. uploading and deduplicating data) in parallel. It's just not implemented.

Cloud storage back-ends are a somewhat similar story. It wouldn't be that complex, although locking is a problem due to the EC model of most of these services. Plans have existed for quite some time now to enable this — just no time to implement them, and other features are requested more frequently.

2 comments

As a user, I don't give credit for features that could be implemented. Somehow they found time to implement this feature and Borg didn't, so they are legitimately ahead in that aspect.
I might be wrong but I want to hear more from you if you're a Borg developer. My understanding is that you may be able to have multiple clients uploading chunks at the same time, but you won't be able to exploit cross-client deduplication if different clients have a similar set of files (OS files or a large code base for instance). Moreover, if your implementation require locks then it would be very hard to extend to cloud services.
Yes, that's right, concurrent addition of the same chunks would generally mean that some work is wasted; so concurrent long running jobs would not synchronize well in this model, and lock-free performs clearly better there.

The only operation which inherently has to be guarded by a lock in Borg is inserting the archive pointer into the manifest (root object, see https://borgbackup.readthedocs.io/en/latest/internals/data-s...). I suppose it would be possible to work around that without locking or to use the usual hacks around EC, put/get/check/get/check?put/get/check?put etc. until it's "probably there".

Deleting / pruning archives would still require a full lock due to the same conceptual issues that your two-phase GC avoids. The same goes for "check".