Hacker News new | ask | show | jobs
by cmurf 2564 days ago
It is incredibly small if you don't consider either drive failing. But if one drive fails, it happens with some regularity that a sector on the good drive is bad. In actuality, only one sector is bad, but in effect the dead drive means its mirror is also bad.

This comes up on the linux raid list with some frequency whenever there are drive failures with raid56, and the subsequently the raid trips over a single bad sector.

But it's true that lack of scrubbing contributes to this scenario, as well as the terrible combination of consumer drives with very high bad sector recovery times and the Linux SCSI command timer default of 30 seconds. That combination ends up causing a masking of bad sectors that end up not getting repaired, and as a user you may not realize that the link resets are not normal and suggest a bad sector as the cause.

1 comments

Are you saying that a failure happens which isn’t detected and when the 2nd failure occurs we notice because the data is inaccessible?

Which raid s/w does this ?

Correct. All that depend on the SCSI block layer, which includes libata and thus common consumer SATA drives. A NAS or better drive will come out of the box with short error time outs, typically 70 deciseconds, and quickly issue a read error with the LBA of the offending bad sector, and the RAID can then know to obtain a copy or reconstruct from parity, write the good data to the bad sector thus fixing it. Either the write works, or if it fails the drive firmware is responsible for remapping that LBA to a reserve physical sector.

In the case where the drive error timeout is longer than the SCSI block layer, it just results in a link reset. The actual problem with the drive is obscured by the reset, including the bad sector, so it never gets repaired.

Btrfs, mdadm, lvm are affected and I'm pretty sure ZFS on Linux as well assuming they haven't totally reimplemented their own block layer outside of the SCSI subsystem.

It's a super irritating problem, the kernel developers know all about it, but thus far it's considered something distributions should change for the use cases that need it. And what that means so far is distros don't change it and users using consumer drives with high error recovery times, get bitten.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

The link you posted talks about the raid software kicking a whole disk out of the raid array when the disk takes too long to respond (basically but not exactly) due to 2 timeout variables mismatch

The post I was responding to implied a raid array could be degraded and you wouldn’t know till it completely failed

Interesting nevertheless