| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cmurf 2612 days ago

Correct. All that depend on the SCSI block layer, which includes libata and thus common consumer SATA drives. A NAS or better drive will come out of the box with short error time outs, typically 70 deciseconds, and quickly issue a read error with the LBA of the offending bad sector, and the RAID can then know to obtain a copy or reconstruct from parity, write the good data to the bad sector thus fixing it. Either the write works, or if it fails the drive firmware is responsible for remapping that LBA to a reserve physical sector.

In the case where the drive error timeout is longer than the SCSI block layer, it just results in a link reset. The actual problem with the drive is obscured by the reset, including the bad sector, so it never gets repaired.

Btrfs, mdadm, lvm are affected and I'm pretty sure ZFS on Linux as well assuming they haven't totally reimplemented their own block layer outside of the SCSI subsystem.

It's a super irritating problem, the kernel developers know all about it, but thus far it's considered something distributions should change for the use cases that need it. And what that means so far is distros don't change it and users using consumer drives with high error recovery times, get bitten.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

1 comments

zaphirplane 2612 days ago

The link you posted talks about the raid software kicking a whole disk out of the raid array when the disk takes too long to respond (basically but not exactly) due to 2 timeout variables mismatch

The post I was responding to implied a raid array could be degraded and you wouldn’t know till it completely failed

Interesting nevertheless

link