Hacker News new | ask | show | jobs
by jspiros 4305 days ago
It is running 24/7; I don't feel comfortable powering it down regularly; that is something I would worry about from the OP's setup, I wouldn't want to subject all those mechanical drives to so many power cycles over time. I don't have figures for only that machine, but my entire rack, which includes a router machine, two ISP modems, the ZFS-running machine and two SAS expanders, averages around 330 watts.

After setting it up, I wouldn't say that it requires any time to manage. Getting it all set up just right, with SMART alerts and capacity warnings and backups and snapshots, all of which I roll myself with various shell scripts, took a long time. Besides that initial investment, the only "management" I have to do is respond to any SMART alerts, add more vdevs as the pool fills up, and manage my files as I would on any other filesystem.

I use the space for just about everything. Lots of backups. I use the local storage on all of my workstations as sort of "scratch" space and anything that matters in the long term is stored on the server. The highest density stuff is, of course, media: I have high definition video and photographs from my digital SLRs, I have tons of DV video obtained as part of an analog video digitization workflow, media rips, and downloaded videos. (I even have a couple of terabytes used up by a youtube-dl script that downloads all of my channel subscriptions for offline viewing, and that's something I doubt anyone would do unless they had so many terabytes free.) I keep copies of large datasets that seem important (old Usenet archives, IMDB, full Wikipedia dumps). I keep copies of old software to go with my old hardware collection. I have almost every file I've ever created on any computer in my entire life, with the exception of a handful of Zip and floppy disks that rotted away before I could preserve them, but that is only a few hundred gigabytes. I scan every paper document that I can (my largest format scanner is 12x18 inches, so anything larger than that is waiting for new hardware to arrive someday), so almost all of my mail and legal documents are on there too.

(I had a dream the other night that someone got access to this machine and deleted everything. Worst. Nightmare. Ever.)

A cloud solution would not have met my use case, since one of the primary needs I have is to be self-sufficient in terms of storing my own data, and I also want immediate local access to a lot of the things on there. I do use various cloud solutions, but only for backup, never as primary storage.

Rolling it myself was definitely cheaper than any out-of-the-box hardware solution I've seen. The computer itself is a Supermicro board with some Xeon middle-of-the-range chip and a ton of RAM, and an LSI SAS card. Connected to the SAS card are two 24-bay SAS expander chassis, which contain the drives, which are all SATA.

I'd say that building something like this would cost you maybe about 4000USD, not counting the cost of the drives. The drives were all between $90 and $120 when I bought them, but of course capacity eventually started going up for the same price over time, so let's say another 3500USD for the drives.

2 comments

With all that data onsite, what are you doing for off-site backups?
I've got some external hard drives that I rotate in and store off-site but still nearby (down the street) for some data. I also have constant online backups running to various locations/services (Linode, AWS, CrashPlan, Dreamhost, some private services). I don't backup everything, only the irreplacable personal data (so, I'm not backing up Wikipedia dumps); at current count, at most 6TiB of the data is irreplacable.
Would you recommend building something like this for a much smaller system? 10TiB or so maybe, I do not need that much, or do you think buying a NAS of some kind would be better?

I kind of want to set something like this up while spending the least amount of money. I am comfortable enough with Debian/Linux to do most things, but I have never managed anything like this. In the end I want to end up with somewhere relatively safe to store data pretty much in the same way you are, I just do not need 70TiB, and I have no experience with ZFS/hardware stuff/storage.

By "something like this", do you mean ZFS? I am a HUGE fan of ZFS, and I do think that it's worth using in any situation where data integrity is a high priority.

As far as ZFS on Linux, it still has its wrinkles. I use it because, like you, I'm comfortable with Debian, and I didn't want to maintain a foreign system just for my data storage, and I still wanted to use the machine for other things too. (I actually started with zfs-fuse, before ZFS on Linux was an option.)

So, I don't know. If you just want a box to store stuff on, you might want to just look into FreeNAS, which is a FreeBSD distribution that makes it very easy to set up a storage appliance based on ZFS. FreeBSD's ZFS implementation is generally considered production-ready, so you avoid some ZFS on Linux wrinkles, too.

So, I'd recommend checking out the FreeNAS website, and maybe also http://www.reddit.com/r/datahoarder/ for ideas/other opinions. I do a lot of things in weird idiosyncratic ways, so I'm not sure I'd recommend anyone do it exactly how I have. :)

If you're comfortable with Debian then you shouldn't have too many issues with FreeBSD as there is a lot of transferable knowledge between the two (FreeBSD even supports a lot of GNU flags which most other UNIXes don't).

Plus FreeBSD has a lot of good documentation (and the forums have proven a good resource in the past too) - so you're never going it alone (and obviously you have the usual mailing groups and IRC channels on Freenode).

While I do run quite a few Debian (amongst other Linux) I honestly find my FreeBSD server to be the most enjoyable / least painful platform to administrate. Obviously that's just personal preference, but I would definitely recommend trying FreeBSD to anyone considering ZFS.

As far as I'm concerned, the most identifiable characteristic of Debian is the packaging system, dpkg/apt. I've used FreeBSD occasionally, and that's what I always end up missing about Debian. I did consider going with Nexenta or Debian GNU/kFreeBSD, but whatever, ZoL works well enough. :)
FreeBSD 10 has switched to a new package manager, so it might be worth giving it another look next time you're bored and fancy trying something new.

I can understand your preference though. I'm not a fan of apt much personally, but pacman is one of the reasons I've stuck with ArchLinux over the years - despite it's faults :)

I'll keep that in mind; I do sometimes find myself with some time to play with things. :)
By 'something like this' I meant pretty much what you just said: Would you do it the same way (your own everything) if you needed a much smaller system, or would you go with something like FreeNAS, like you suggested? I am confident I c an get it working good either way, but I would rather not spend half my days having to tweak and worry about stuff working correctly. I understand that it will need maintenance and monitoring of course, but I would much rather be more of a end-user having a working system than being the sysadmin that has to fix it all the time. :-)

Thanks for the link, I will take a look there.

Well, if you don't get a kick out of "tweaking and worrying", yes, I definitely recommend FreeNAS. Although I'm confident in my system now, it took a long time to get this way, and I could've saved hundreds of hours by just going with something like FreeNAS (had it existed); I stuck with it because I kinda enjoy doing things the hard way.
I do kind of get a kick out of that, but at the same time I also just want a safe system for storing data. If I end up building something like this I will take a look at FreeNAS! Thanks!
I have a similar setup with 12TB capacity. ext4 over mdadm RAID-6 w/ 2 spare drives. It's specifically setup such that any single failure (including SATA expansion card) can't bring down the pool. It's been stable for ~2 years, and it's really nice to have that much storage in the house.

You don't need ZFS for this, as cool as it is.

ZFS still protects you from bitrot when compared to ext4 over mdraid. When you get to many terabytes of data, it's almost guaranteed that you're going to lose something to bitrot. In my case, my most recent scrub detected and repaired 1.58MB of bitrot. And in any given month, `zpool status` will show one or two checksum errors as having been corrected in real-time, as I was working with the corresponding files directly.

This is probably the number one thing that excites people about ZFS over any other solution, and it's something that isn't really easily implemented on a standard RAID + standard filesystem arrangement, since this sort of functionality depends on the filesystem knowing about the underlying disk arrangement.

"ZFS uses its end-to-end checksums to detect and correct silent data corruption. If a disk returns bad data transiently, ZFS will detect it and retry the read. If the disk is part of a mirror or RAID-Z group, ZFS will both detect and correct the error: it will use the checksum to determine which copy is correct, provide good data to the application, and repair the damaged copy."

https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data

How is that any different from standard RAID? That's exactly the problem RAID was created to solve...
ZFS & btrfs detect and fix silent corruption - where no errors are emitted from the hardware.

I think the pertinent question is: when the filesystem goes to read a 4K block, and one drive's copy of this block in the RAID-1 set is different to its counterpart 4K block on another disk, which one wins?

I didn't specify RAID-1. RAID-5 or RAID-6 can reconstruct the correct value in a silent fail.

Honest question: how often to drives silently fail? Drives contain per-sector checksums these days, explicitly to prevent this problem.

A normal raid does not check the checksums on read. It only uses them after a device failure.

Also it may have copies of the data eg raid 1 but does not know which is correct if they differ.

No, every default mdadm install performs a complete scrub on the first Sunday of the month. Every block of the array is read back and validated. For RAID modes with parity (e.g. RAID-5, RAID-6) it is able to detect and fix the offending disk when a silent error occurs. You can trigger such a scrub whenever you want (I run mine once a week).
That's interesting, I've been running a 6x3TB raidz2 for a year or so on wd reds and no bitrot so far, no checksum errors either, regular scrubs.
Almost all of the bitrot I see is on the oldest vdevs, which at this point probably contain mostly only old snapshots that are almost never accessed. My oldest vdevs are... 4-5 years old.