Hacker News new | ask | show | jobs
by kstenerud 1507 days ago
There comes a point where the complexity itself becomes too much of a liability. It's important to be able to trust these algorithms as well as all popular implementations with your data.
1 comments

One should verify the integrity of stuff like backups or archives anyway, by supplying the end user with a sha1 or better hash of both the compressed/encrypted archive as well as all of the files it contains, and by regularly verifying if both still match.
Yes...though I'd say to rule out sha1, or any other "no longer considered secure" hashes. The space & time savings (vs, say, sha-512) are really not worth baking into your backup format & procedures. Keep in mind that you might need to really verify the integrity of your backup during a ransomware incident, or as part of a high-stakes legal situation, or ...
SHA1 and even MD5 are broken for collision resistance, not second preimage (which is what matters for backups). While it can still make sense to not use SHA1 for anything, it is fine if you do use it. Blake3 is the one to use if you are looking for maxium speed, although it is newer and some may avoid it for that. In a quick test (using hyperfine -w 1 -m 3) on a 8GB file (Arch Linux, i5-6260U 2 core processor with hyperthreading disabled) times are 1.995s for b3sum, 11.847s for sha1sum, 14.495s for b2sum, 16.903s for sha512sum, and 24.737s for sha256sum. md5sum for comparison is 16.492s, just a tiny bit slower than sha512sum, so that is why you rarely see it these days even for things like backups. SHA3-224 (sha3sum) took 39.531s and SHA3-512 (sha3sum -a 512) took 73.164s, although they will eventually be very quick with hardware support.

xz -C sha256 will use sha256 on the uncompressed data.

Personally, I use mtree inside the backup file (from pkgsrc bootstrap on Linux), although it has trouble with a few unicode file names. That kind of tool is great and I'm not sure why there doesn't seem to be an equivalent in the Linux world.

Ideally, yes. Yet logrotate uses zlib to compress log files and deletes the originals. That's how trusted it is.