Hacker News new | ask | show | jobs
by stevenAthompson 913 days ago
I see this advice repeated frequently, but it's always very general.

Do you have any advice as to HOW the average home NAS user can affordably backup modern NAS devices?

The last time I looked it could easily cost hundreds of dollars per month to back up as little as 40TB to the cloud.

13 comments

Well data protection is expensive, nobody said the contrary.

Backup what you value the most, ignore what you don't and apply tiers depending on what needs to be kept but you can deal with transferring it back home slowly and what you need immediately in case of a failure.

My rules of thumb are:

- always invest 3x the price of your hot live NAS storage in backups. If you can't afford buying 40TB of storage, you can't afford having 10TB of live storage. Period. Goal is to have at least one copy locally and one externally and have more space to on the backup storages to account for retentions, changes and help with migrations.

- if you can't afford 3 redundant storages(RAID), favor having 3 times non redundant storage (no RAID) over having less copies of redundant one.

Additional tip to reduce cost and avoid expensive cloud offering is to find a reliable and trustable relative or friend that can host your external copy of your backup. Nebula or Tailscale now makes it very easy without having to configure routers and stuff. In exchange you can offer that person to host his/her backup storage.

Also digitalizing material stuff is nice, but printing digital photos is also a great way to preserve copies. I'd rather save the photos I cherish the most than having 3 backup copies of 10TB of blurry or non outstanding photos. After years of having them all digitally, I am inveting back in printing photos and making albums. You can also print photobook multiple times and have some stored at a relative's place.

As the sibling says, 40 TB is not exactly "average home nas" territory. What I personally do, though I don't have 40 TB available even if I counted all my hard drives together, is I just have a second device that can hold the data and back up to it regularly.

My NAS has something like 5 TB used. It's all synced to an old server that can hold about 8 TB and that's off most of the time (no fun living next to a jet engine). This cold server lives at my parents' house.

My "really important stuff" on the NAS, which is a few hundred GB of pictures and such, is regularly backed up to a bucket with object locking.

My "super important stuff", which is my company's accounting and other such documents, and lives on my laptop, is backed up to the live NAS and handled there as the really important stuff. I also back up my laptop to two normally offline external drives, one of which lives in my apartment and the other at my parents' house.

Everything non-cloud is ZFS, so after each backup to an external drive or "cold NAS", I run a scrub to make sure it is still operational. The live NAS runs a scrub every Monday morning.

Granted, this is not a "modern NAS" environment, since it made no sense to me to forego the free servers that my employer was going to send to the trash and buy some expensive off-the-shelf solution without the guarantees of ZFS (despite the issue TFA talks about). I know about power usage, but my live NAS eats less than 50W at idle (which is 99% of the time), so breaking even with the electricity prices in France would take forever.

I agree with you completely that it's used in too trite away. Which I think has echoes to backups and a lot of other "data hygiene" things in general (like doing backups at all initially, or strong passwords, or setting up new systems) which our industry has a long and unfortunate history of leaving manual and assigning a PEBKAC to when what was really needed was more automation. Manual effort doesn't scale, and cost is absolutely a critical issue for a long tale of data owners. A fundamental part of the entire value of ZFS and NAS for that matter is automating away all sorts of issues surrounding data integrity, from checksumming to disk integrity to backups, and doing so in a way that's highly dependable.

Which is how it should be. Yes bugs can happen but there's only so many 9s most of us can chase on our budgets. And "always test backups" in particular adds cost. Testing means restoring onto hardware that you can then use live, separate from your actual primary hardware or at a minimum on primary hardware with >2x the set size and enough performance to squeeze it in during downtime or around work. So yet another big increase in cost. "Testing backups" isn't trivial.

I have about that much data and LTO-6 (2.5tb per tape), and it's a huge PITA. I'm probably doing it wrong, but this is what worked for me: making an ext4 filesystem as a file, exactly 2500gb in size, formatting it, and stuffing it with data until there is < 5 gb free. take the checksum and manifest of that file, and write it to tape (takes 4 hrs without verify, plus another 1-3 hrs (can't remember now, its faster) to verify. repeat until your 40tb is done.

I know you can use ZFS snapshots but I'm not experienced enough to trust that I could make a 20-40tb snapshot without screwing something up. Plus it's all video files so I can roughly keep track of what's what and I can ignore the stupid LTO compression.

It takes days, its noisy, and very tedious. But thats #hoarderLyfe lol

“Average home NAS user” doesn’t have 40TB of data. With a subset of data that’s important like photos it’s not that expensive and with Backblaze and other services that are directly integrated in operating systems like Synology also not that hard to do.
I agree with the advice which is what we do. Average home user (with emphasis on average) doesn't have 40TB, but a "normal" non-professional one might.

We have about 9TB of photos. I can easily imagine someone like us, who is into video, of having more than 40TB of videos.

When will you ever be able to appreciate and look at 9T of photos?
You don't always immediately know which ones will be important.

Today you might take 10 photos of your family and keep the best one where everyone is smiling.

But 10-20 years from now you will probably appreciate having kept the other 9 where the baby is crying, the kid is making a face, and grandma has started to wander off.

AI tools analyze photos pretty well now. It’s very common they bubble up old photos I had forgotten about.
Good point, now AI is a real good excuse for thoughtless data hoarding.
When you're old and retired, and are reminiscing about your kids or grandkids back when they were small, or about past vacations.

My parents tend to take a lot of photos whenever the family is together, and it used to bother me. Only in recent years I started to understand them.

I've passed through the other end of this. I spent a few hundred hours scanning my father's and grandfather's slides, negatives, and prints on high-end scanners in 2010. There were thousands of images, and since then that number has probably increased several orders of magnitude with digital cameras and then phones. The sheer number is beyond human comprehension. Now that images are so trivial to make, I value curation much more than shear number. I suppose it's always a quantity vs quality thing.
LTO. I bought an LTO-5 system to backup 6TB of critical data and 12TB of nice-to-have data. LTO-6 is better if you can afford it.

Downside to tape backup is you need throughput, or the ability to do disk-disk backups

For 20 TB LTO seems too expensive.

20 TB of SSD costs about $1000.

Or you could get a 20 TB hard drive for $300.

Drive failure and managing those drives are hidden costs you are not considering.

I have had multiple hard drives fail and been left stranded. Tape fails but not nearly as often as disks

LTO6 and LTO7 are not expensive for 20TB

If you really need 40TB of irreplaceable data, then I think S3 Glacier Deep Archive might be worth looking at. According to the Amazon calculator it's something like $45/month, though of course the data might take a while to get ready if you need to restore it. There are other S3 Storage tiers as well, that are a bit more expensive but offer quicker recovery. Backblaze B2 looks like it would be about $240/month, which is IMHO also pretty reasonable for 40TB. I haven't calculated the initial traffic costs though, I assume the first upload might be a bit costly, but once it's up there, you just pay storage until you need to restore it.

If you can figure out how to split the data into categories, you could save money as well. E.g., which of this data is truly irreplaceable - stuff like personal photos, source code, whatever it is that can never be re-created. If you're running a business, then stuff that needs to be available immediately in order to keep the lights on. Those things needs to be on storage that also gets backed up daily, preferably in full, and preferably to multiple clouds.

Stuff that can be re-created from sources (e.g., rendered outputs) are less critical because in the worst case, you can just spend some days/weeks to re-create it.

Also consider regular offline backups - put it on a tape drive or on some hard disks/SSDs or even optical media (yes, it would take something like 400 BDXL disks to back up 40 TB, but I assume the data doesn't rapidly change) and put it in some offsite storage facility in case your place burns down.

My cheap solution for large datasets is to buy a raspberry pi and external hard drive(s), setup in a friend or relatives house, and setup syncthing. One friend has a copy of my ripped discs, my parents have copies of my photos, etc. Make sure the remote instance is in read only mode.

For sensitive data I would run something else that can be a Restic target so backup data is encrypted, I currently use a cloud drive that supports WebDAV for that.

How do you perform the testing of these backups tho?
I don’t try to backup my Plex library. Most of my family pictures and videos are on my MBP and I rsync the picture folder a couple times a month to the NAS. Every 6 months I get my cold storage 6TB drive and back up what I can. My MBP runs Backblaze so I have another backup of my most critical items.
AWS S3 Deep Glacier is really cheap nowadays (at least in some zones), on the order of $1/TB. As an average home NAS user with 8TB of data, I've finally taken the plunge and started backing it up. It was never worth the cost before.
How much is recovery of let's say 500gb a month/1 full restore a year ?
Googling says 2c/GB, cheaper (10x) in bulk.
You might wanna double-check your math. I used the AWS pricing calculator, said I wanted to store 8000GB in Glacier Deep Archive in us-east-2, and wanted to recover it using 16000 API requests (wild guess). That, plus $0.05-$0.09/GB transfer came out to about $960 to recover.

Glacier is always super cheap as long as you don’t need to recover, and then it’s ferocious.

I use restic to back up my NAS to Hetzner storagebox.

Also, you can probably tier your data. Maybe you don’t need same level of backup for all your 40TB.

> The last time I looked it could easily cost hundreds of dollars per month to back up as little as 40TB to the cloud.

You only have to backup the data that is important to you and you don't want to lose in case your house gets robbed, floods, burns down, etc.

If you don't mind losing 40T of data, you don't have to back it up at all.

Otherwise get another NAS, installed it at family/friend's house, and set up a VPN between the two: then use rsync/zfs-send/whatever.

Cloud archival tier storage is much cheaper than that now.

Glacier vaults in S3 are quite affordable these days.