Hacker News new | ask | show | jobs
by velodrome 4598 days ago
The hard disk drive quality has dropped over the last few years.

* Most consumer drives over 2TB have extremely poor reliability. Just check any Amazon or Newegg review (DOA and early mortality show up with more frequency). Yes, I know using reviews are not accurate but since there is no public information of drive failure rates there is not really much to go on.

* The reduction of manufacturer warranty since Thailand floods. Surprise, they never changed it back to the original 3 year warranty.

If you have a large array of disks, there is nothing to really to worry about. If you have a small set of drives, spend a little extra and get the "Black" or RE drives with 5 year warranty. Avoid any "Green" drive.

3 comments

I have to suspect its because bleeding-edge drives have to be over-rated to compete. They have better margins which should mean they could afford better quality. But they can barely make the drive at all; they want to ship before reliability is quite there; so top-end drives are sketchy.
Why avoid the green drives? I assumed those were less power hungry and spinning slower so more reliable. I've been ordering them for RAID5 arrays and not had too many issues yet.
Greens have had problems with aggressive head parking. If you have an idle set of them you can go through their design limit of head parks in a couple of months and start to get failures shortly after. Done that.

Check your S.M.A.R.T data. Look at the head park number. (Load cycles I think it is called, can't look it up now). If it is a six digit number, you are in trouble. For a server you want if to be in the same order as to power ups. Anything else and you have to explain to yourself "why?"

Edit: adding. The 1TB and smaller greens were disasters. I ruined a lot of them. I was told all of the 2TB and up greens didn't have head park issues, but spent part of last week replacing a storage unit populated with 2TB greens when a spindle failed (>200 unrecoverable blocks) and found that some of the 2TB greens were load cycling into the 200000 range, others weren't running up. They were all identical models purchased at the same time. Maybe they had different firmware? I replaced hem with REDs. They aren't supposed to park and they won't try to recover a bad sector for more than a few seconds so the don't hang your RAID when they get bad sectors.

As someone who inherited 240 24/7 running WD-Greens: http://idle3-tools.sourceforge.net/ works fine but disabling the timer has negative performance impact. 3000 seconds is fine through. But you need a complete powercycle before the changes take effect. No more parking. Does make a difference in longevity in my not very scientific opinion.

I can second the 200> bad blocks. Sometimes they still work fine after using badblocks -w a few times on them and raising the timer.

Good to know. I JUST bought a green WD drive (still in transit from Amazon) so my future thanks you.
I assume you mean Start_Stop_Count. A quick check on two servers, each with 4 green drives in RAID6 and RAID5 setups tells me this hasn't been a problem. Both have Start_Stop_Count's below 100 (on the order of the number of boots the servers have had). I don't see any other number that could be the head park.

The number I have been finding to be high is Hardware_ECC_Recovered (values between 1036555546 and 2699460003). Not sure if that's normal. I've also had two 1.5TB drives now end up with unrecoverable sectors. RAID recovers from that just fine but I've been replacing them as it keeps reoccurring and is supposed to be a signal of failing disks. These 1.5TB drives are a 3+´years old and I've been thrashing them a bit lately. I'd have expected them to last longer though.

I may have spoke too soon. One of my servers has 2 Samsung Green and 2 WD Green drives in RAID6. Here's the SMART value that you seem to be discussing:

  $ for dev in `ls /dev/sd?`; do echo $dev; sudo smartctl -a $dev | grep Load_Cycle_Count; done | cut -d " " -f 2,40
  /dev/sda
  Load_Cycle_Count 24
  /dev/sdb
  Load_Cycle_Count 24
  /dev/sdc
  Load_Cycle_Count 1947798
  /dev/sdd
  Load_Cycle_Count 1907706
sda and sdb are the samsungs and sdc and sdd are the WDs. I also just replaced a failing Samsung Green drive in another machine with a WD and it already has a Load_Cycle_Count in the 10000s. I guess I need to start avoiding Green WDs at least, maybe the Greens altogether.
I'm curious, why run 4 disks in RAID6 instead of RAID10? You loose two disks worth of capacity to RAID in either case, but with RAID6 there's also parity overhead, slower recovery, slower performance, especially in degraded mode?
RAID10 only gives you room for one disk failure in some scenarios.
This might be anecdotal but external WD (MyLife) drives are usually from the Green series and I had 2 different ones fail on me after about 1 year of use. Same happened to 2 friends of mine. I blamed it on the constant head parking (it went idle after 10 minutes of unuse).
I have 4 WD greens, one from an external enclosure the others being internal drivers, the first being maybe 3 years old now, all still going fine, also anecdotal
My experience is that they spin down way too often for server usage, and eventually break much faster than others.

Some of them were also crippled in firmware so you couldn't use them in RAID1 arrays, but this might have changed.

Isn't spin down controlled by the host and not the drive?
The host can spin down a drive manually, but most often it's done autonomously be the drive.

‣  In Linux you can manually ...

• check the power state of your drive using: hdparm -C /dev/sda

• manually spin down the drive (standby) using: hdparm -y /dev/sda (it will immediately spin up at the first attempt to read a sector)

‣  Or you configure the automatic standby of the drive (which also does not involve the OS)...

• hdparm -S n /dev/sda will configure the timeout of the drive to a value encoding the time to spin-down on a non-linear scale, check the manpage

• hdparm -B n /dev/sda will configure another type of power management which doesn't specify a fixed timeout, but rather a vendor-defined type of arbitrary power saving measures on a scale of 1..254 (1: waste power, 254: conserve power, n>128 allows spin down)

The latter two options are handled internally by the drive and (as far as I know) even stored non-volatile.

http://linux.die.net/man/8/hdparm

(Edit: fixed my broken English ;-) )

So, actually you got -B backwards. n < 128 allows spin-down.
Some drives like WD's Green series have variable RPM to save power (and therefore heat). I don't think the OS is involved at all with controlling how fast those disks spin (so long as they are spinning at all).
WD's marketroids tried very hard to give people that impression, but it's false.

http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771...

At the very bottom, in small, grey text, you'll find "IntelliPower" defined as:

"A fine-tuned balance of spin speed, transfer rate and caching algorithms designed to deliver both significant power savings and solid performance. For each WD Green drive model, WD may use a different, invariable RPM."

The RED drives you are refering to actually have some of the worst reviews around compared to even normal consumer drives. (Source: newegg reviews you mentioned)
i'm not sure where you're getting that info:

http://www.newegg.com/Product/Product.aspx?Item=N82E16822236...

http://www.amazon.com/WD-Red-NAS-Hard-Drive/dp/B008JJLZ7G

been running 3x of these in a raid-5 NAS, no issues so far (not that it's any kind of indicator on a system which idles as a backup all day)

About half of the Newegg reviews are 3-eggs or lower. And even some of the 5-egg reviews mention dead drives! They only got 5 eggs because of Newegg's easy return policy. The drives are clearly unreliable.
great :(