Hacker News new | ask | show | jobs
by speleo_engr 2984 days ago
I've used XZ to compress tarballs of backup. XZ was useful so I could store more backups on an external hard drive. I have seen bit rot on some of these files (stored on a magnetic HDD), in the sense that the md5sum of the .tar.xz archive no longer matches when it was created. What do you suggest for creating parity/ECC in this case? I'm aware of parchive, but is that the right choice and in what configuration?
2 comments

Keep in mind I'm not an archival expert so you should do your own research. That being said, currently I'm using pyFileFixity [1] to generate the hashes and ECC data for my personal backups. I write them to M-Disc Blu-rays using Dvdisaster [2] which can also write additional ECC data. After a lot of googling and reading this useful Super User question [3], and this extensive answer [4] I settled on this setup. I must admit that I am guilty of storing images as JPGs and compressing most most of my files in ZIPs for convenience.

[1]: https://github.com/lrq3000/pyFileFixity

[2]: http://dvdisaster.net/en/index.html

[3]: https://superuser.com/q/374609/52739

[4]: https://superuser.com/a/873260/52739

The whole structural adaptive encoding seems like massive overcomplication. I feel like clever tricks such as that serve only to bite in the ass when you need it the most.

Same goes for the bit jpeg. Sure, it might not be ideal technically, but recommending JPEG2000 (presumably as there is no JPEG2) with its ridiculously poor software support seems weak too. What use is robust file that you can't open?

When you're transferring files and need to cope with corrupted/missing chunks, you should use a parity scheme. Others have mentioned that; it's common for, for example, Usenet.

If you can't control the underlying storage, then ditto. Keeping and maintaining explicit parity chunks is somewhat inconvenient, but it works.

But if you just want to avoid bitrot of your own files, sitting on your own HDD, I'd recommend using a reliable storage system instead. ZFS or, at higher and more complicated levels, Ceph/Rook and its kin. That still offers a posix interface (unlike parity files), while being just as safe.

If I am using a single HDD, can ZFS still add parity data? That's neat if it can. I assumed parity with ZFS was for something like RAID6 where there are multiple HDDs in a set.

Do any other file systems other than ZFS support adding parity in a single HDD config? Last I checked getting ZFS in Linux required lots of side band steps due to licensing issues.

ZFS can do multiple copies of a file on a single hard drive. It is not adding parity.

ZFSOnLinux is developed outside Linux’ tree for 2 reasons. One, it is easier that way and two, Linus does not want it in the main tree. Consequently, you need to install it in addition to the kernel as if it were entirely userspace software. That does not add anymore difficulty than say, installing Google Chrome. :/