Hacker News new | ask | show | jobs
by jws 4598 days ago
Greens have had problems with aggressive head parking. If you have an idle set of them you can go through their design limit of head parks in a couple of months and start to get failures shortly after. Done that.

Check your S.M.A.R.T data. Look at the head park number. (Load cycles I think it is called, can't look it up now). If it is a six digit number, you are in trouble. For a server you want if to be in the same order as to power ups. Anything else and you have to explain to yourself "why?"

Edit: adding. The 1TB and smaller greens were disasters. I ruined a lot of them. I was told all of the 2TB and up greens didn't have head park issues, but spent part of last week replacing a storage unit populated with 2TB greens when a spindle failed (>200 unrecoverable blocks) and found that some of the 2TB greens were load cycling into the 200000 range, others weren't running up. They were all identical models purchased at the same time. Maybe they had different firmware? I replaced hem with REDs. They aren't supposed to park and they won't try to recover a bad sector for more than a few seconds so the don't hang your RAID when they get bad sectors.

3 comments

As someone who inherited 240 24/7 running WD-Greens: http://idle3-tools.sourceforge.net/ works fine but disabling the timer has negative performance impact. 3000 seconds is fine through. But you need a complete powercycle before the changes take effect. No more parking. Does make a difference in longevity in my not very scientific opinion.

I can second the 200> bad blocks. Sometimes they still work fine after using badblocks -w a few times on them and raising the timer.

Good to know. I JUST bought a green WD drive (still in transit from Amazon) so my future thanks you.
I assume you mean Start_Stop_Count. A quick check on two servers, each with 4 green drives in RAID6 and RAID5 setups tells me this hasn't been a problem. Both have Start_Stop_Count's below 100 (on the order of the number of boots the servers have had). I don't see any other number that could be the head park.

The number I have been finding to be high is Hardware_ECC_Recovered (values between 1036555546 and 2699460003). Not sure if that's normal. I've also had two 1.5TB drives now end up with unrecoverable sectors. RAID recovers from that just fine but I've been replacing them as it keeps reoccurring and is supposed to be a signal of failing disks. These 1.5TB drives are a 3+´years old and I've been thrashing them a bit lately. I'd have expected them to last longer though.

I may have spoke too soon. One of my servers has 2 Samsung Green and 2 WD Green drives in RAID6. Here's the SMART value that you seem to be discussing:

  $ for dev in `ls /dev/sd?`; do echo $dev; sudo smartctl -a $dev | grep Load_Cycle_Count; done | cut -d " " -f 2,40
  /dev/sda
  Load_Cycle_Count 24
  /dev/sdb
  Load_Cycle_Count 24
  /dev/sdc
  Load_Cycle_Count 1947798
  /dev/sdd
  Load_Cycle_Count 1907706
sda and sdb are the samsungs and sdc and sdd are the WDs. I also just replaced a failing Samsung Green drive in another machine with a WD and it already has a Load_Cycle_Count in the 10000s. I guess I need to start avoiding Green WDs at least, maybe the Greens altogether.
I'm curious, why run 4 disks in RAID6 instead of RAID10? You loose two disks worth of capacity to RAID in either case, but with RAID6 there's also parity overhead, slower recovery, slower performance, especially in degraded mode?
RAID10 only gives you room for one disk failure in some scenarios.
Yep, that's it. The servers I run are all personal and their main workload is keeping my files safe. Being able to survive 2 drive failures in all situations is important. I just discovered in this thread that on that RAID6 array 2 drives are actually suffering from excessive head-parking. So having used RAID6 and bought 2 different types of drives bought me some insurance against the simultaneous failure of the two drives. The performance is fine anyway.
And for anyone that doesn't think this is worthwhile:

We recently had 3 servers have two drives each fail within hours of each other, with about two weeks between each of the 3 servers. These were 3 out of 4 servers that had been configured at the same time, with drives from the same delivery - clearly something had gone wrong.

Usually we try to drive types, but we didn't have enough suitable drives when we had to bring these up. Thankfully we do have everything replicated multiple times and very much specifically avoided replicating things from one of the new servers to another.

When we brought them back online we got a chance to juggle drives around, so now they're nicely mixed in case we get more failures.

For my private setup, I've gone with a mirror + rsync to a separate drive with an entirely different filesystem + Crashplan. Setups like that seems paranoid until you suffer a catastrophic loss or near loss...

My first big scare like that was a decade or so ago when we had a 10 or 12 drive array of IBM Deathstar (Deskstars) that started failing, one drive after the other, about a week apart, and the array just barely managed to rebuild... Particularly because it slowed the array down so much during the rebuild, that we were unable to complete a backup a day while running our service too, and taking downtime was unacceptable. So our backups lagged further and further behind while we waited for the next drive failure.. Those were some tense weeks.

This might be anecdotal but external WD (MyLife) drives are usually from the Green series and I had 2 different ones fail on me after about 1 year of use. Same happened to 2 friends of mine. I blamed it on the constant head parking (it went idle after 10 minutes of unuse).
I have 4 WD greens, one from an external enclosure the others being internal drivers, the first being maybe 3 years old now, all still going fine, also anecdotal