Hacker News new | ask | show | jobs
by e12e 3255 days ago
It's a little odd to not benchmark backup over the network - a backup taken to the same physical disk as the source of the data isn't very useful - for that use-case taking a filesystem snapshot[s] would probably be faster and more useful. Perhaps in combination with a checksumming tool, like [c], or with a filesystem like ZFS.

Also, it can be difficult in a lot of environments to sustain more than 100mbps write to a remote, off-site system - halving the stored data can be a much bigger win then.

All that said, it's interesting to see that a) duplicity seems slow, and b) very consistent in terms of speed. I wonder if there's some low-hanging fruit for optimization there.

Personally I've had some luck using backupninja[b] in combination with duplicity. It's one of the few Free alternatives that allow the backup-system to encrypt "one-way" - so that compromising the backup-system doesn't immediately give read access to encrypted backups. It's a bit complicated to set it up for separate encrypt-to and signing keys though :/

[s] Today I would probably recommend ZFS - but I've always wanted to give NILFS2 a real test, especially on solid-state disks: http://nilfs.osdn.jp/en/

[c] https://github.com/Tripwire/tripwire-open-source

http://aide.sourceforge.net/

https://github.com/integrit/integrit (Speaking of projects that might be fun/useful to redo in a safe language like rust or go - it would appear this would be a prime example, btw. On the whole moving integrity to the fs, as with zfs might be the better option, though).

[b] https://0xacab.org/riseuplabs/backupninja

1 comments

> All that said, it's interesting to see that a) duplicity seems slow, and b) very consistent in terms of speed. I wonder if there's some low-hanging fruit for optimization there.

Duplicity is classic delta-backup. It always reads all files and calculates a delta to a different version of the file, hence the fairly consistent performance. Performance of deduplicating archivers is more difficult to predict.