Hacker News new | ask | show | jobs
by pedrocr 4598 days ago
I may have spoke too soon. One of my servers has 2 Samsung Green and 2 WD Green drives in RAID6. Here's the SMART value that you seem to be discussing:

  $ for dev in `ls /dev/sd?`; do echo $dev; sudo smartctl -a $dev | grep Load_Cycle_Count; done | cut -d " " -f 2,40
  /dev/sda
  Load_Cycle_Count 24
  /dev/sdb
  Load_Cycle_Count 24
  /dev/sdc
  Load_Cycle_Count 1947798
  /dev/sdd
  Load_Cycle_Count 1907706
sda and sdb are the samsungs and sdc and sdd are the WDs. I also just replaced a failing Samsung Green drive in another machine with a WD and it already has a Load_Cycle_Count in the 10000s. I guess I need to start avoiding Green WDs at least, maybe the Greens altogether.
1 comments

I'm curious, why run 4 disks in RAID6 instead of RAID10? You loose two disks worth of capacity to RAID in either case, but with RAID6 there's also parity overhead, slower recovery, slower performance, especially in degraded mode?
RAID10 only gives you room for one disk failure in some scenarios.
Yep, that's it. The servers I run are all personal and their main workload is keeping my files safe. Being able to survive 2 drive failures in all situations is important. I just discovered in this thread that on that RAID6 array 2 drives are actually suffering from excessive head-parking. So having used RAID6 and bought 2 different types of drives bought me some insurance against the simultaneous failure of the two drives. The performance is fine anyway.
And for anyone that doesn't think this is worthwhile:

We recently had 3 servers have two drives each fail within hours of each other, with about two weeks between each of the 3 servers. These were 3 out of 4 servers that had been configured at the same time, with drives from the same delivery - clearly something had gone wrong.

Usually we try to drive types, but we didn't have enough suitable drives when we had to bring these up. Thankfully we do have everything replicated multiple times and very much specifically avoided replicating things from one of the new servers to another.

When we brought them back online we got a chance to juggle drives around, so now they're nicely mixed in case we get more failures.

For my private setup, I've gone with a mirror + rsync to a separate drive with an entirely different filesystem + Crashplan. Setups like that seems paranoid until you suffer a catastrophic loss or near loss...

My first big scare like that was a decade or so ago when we had a 10 or 12 drive array of IBM Deathstar (Deskstars) that started failing, one drive after the other, about a week apart, and the array just barely managed to rebuild... Particularly because it slowed the array down so much during the rebuild, that we were unable to complete a backup a day while running our service too, and taking downtime was unacceptable. So our backups lagged further and further behind while we waited for the next drive failure.. Those were some tense weeks.

That's a great example of what I worry about. On my first server I bought 4 identical drives when I built it and then when I needed more space I again bought 4 identical larger drives. Since this was all on RAID5 the risk was actually pretty high. On my second server I bought two of each manufacturer and used RAID6 so now I can survive a whole batch going wrong at the same time. Next time I need to build one from scratch I may even go for 4 different drives (mixing red/green/etc).

What I am doing now as soon as I get unrecoverable errors from the drives is to replace them one at a time with whatever is the best cost/TB drive. Whenever all the drives have been upgraded I can resize the array to the new minimum drive size.