Hacker News new | ask | show | jobs
by PetahNZ 1174 days ago
Most people would be better off putting that money into a backup solution.
5 comments

You can't use the deduplication and compression which amplifies bitrot if your memory is (too) suspect and backups don't help you unless you notice the problem quickly enough.

Unreliable memory can corrupt your data in ways you won't notice immediately by which point the corruption could have spread. Lets use a SQLite database as example: an index is read into memory, a bit flips in memory, the wrong result is returned by the corrupted index, the corrupted cached index is used in queries, a corrupted query result used to update the database, the corrupted cache wasn't marked dirty so it doesn't get written back to the database file. Repeatedly backing up the ticking logic bomb created this way doesn't help you.

Subtle data corruption due to bit-flips isn't necessarily solved by backups. You might not notice until the damage is already done, and having old good data somewhere would be no help at all.
This happened to me. Lost a bunch of data.

I had backups on an external drive that I'd periodically copy data to.

I can't remember the exact sizes but this will still explain in principle what happened.

I had a 1TB drive in my desktop. I had a 500GB external drive. At time of purchase I had less than 500GB to back up.

At some point in time my desktop hard drive started corrupting data unbeknownst to me.

The amount I needed to backup grew beyond 500GB so I purchased a new larger backup drive. I did a full copy (corruption and all) from my desktop to new backup drive.

At some point I repurposed the old backup drive for something else erasing it. It is at this point I have irrecoverable data loss and I still don't know.

The corruption became so widespread on my desktop drive I became aware of it. I check my backup and discover a non trivial amount of my data was corrupted.

I had a similar thing happen to me. The sata controller probably failed.

At first it corrupted a few files. I though nothing of it since I had a few power outages. Then more files. So I reformatted but file corruption kept happening. Switched the drive to a separate chipset with the same cable and all was good.

My current solution to this situation is a Low power PC which runs FreeBSD that has ECC RAM and a ZFS pool consisting of five mirrored drives. This PC gets backups pushed to it from my main workstation and makes a snapshot each time. I plan to change it though to a pull configuration. This way it will be immune to crptolocker software performing privilege escalation attacks since no services will be offered and no credentials will be viewed by the workstation. I have to configure it using its own keyboard though.

Even then the backups need to be tested.

> Even then the backups need to be tested.

Isn't that the role of zfs scrub?

Or do you mean testing if say a JPG file is still a valid JPG?

I think there are scripts that can store a md5 of each file in a sqlite database for filesystems without checksumming such as xfs

I meant tested before restoring. If the same problem as mentioned above were to occur I would have backups of all my files pre corruption though they would be spread across multiple snapshots.

Also from my understanding TCP/IP error correction isn't that that great: https://news.ycombinator.com/item?id=25335936

It's definitely possible to write a script that compares a file across multiple snapshots and flags it if it's content changes but its modification time does not. It will just get tricky when the file gets modified between backups as the file could have been modified then corrupted then backed up. In that case how does the script know that the file has been corrupted?

So your local disk/RAM corrupts data and it gets pushed to the ECC box...

It's all well if you notice it soon enough, but for rarely touched files they can drop off retention and you're left with corrupted copy

True. The snapshots are not rolling though and I don't have much data but you are right. It's not going to be fun picking through my snapshots for individual files if they get corrupted over time.

This seems like an unavoidable issue though when using a workstation without ECC RAM and a copy-on-write filesystem. I thought about moving the files off my workstation to my NAS which stores my media files. This does tick both the CoW and ECC boxes but it's not properly set up yet. Setting up an iscsi target on the NAS is an option but then it gets fidley when trying recover specific files from different points in time since I can't just browse the snapshot like any other filesystem.

Getting ECC memory into workstation should be just "okay, you want it, pay 20% more and you get it", not having to find which combination of CPU,firmware and motherboard is needed for it, it's sad state we're in.
I'm a firm believer that local persistent storage should be ripping fast and should fail fast and hard, with data loss. You need backups, it's non-negotiable.

That said, one of the prerequisites for this is that the stuff you're writing to the disk in the first place isn't corrupt.

Well, many SSDs already implemented "just fucking die without option to read any data" already...
Yep, my favourite kind of SSD. I don't care if you trash my photos, they're backed up. I don't care if you trash my code, it's all in git.
Backing up corrupted data isn't helpful.
hard for that data to bypass RAM, so backing up corrupt data