| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by notacoward 1923 days ago
	In a large enough system, over a long enough time, even rare failure modes become inevitable. I was hearing about RAID-6 insufficiency at national labs ten years ago. Rebuild times were already long enough that, sooner or later, a second and then third failure would hit the same RAID group during the first rebuild. Data go poof. Since then, I've worked on even larger storage systems and seen overlapping failures cause data loss with even higher levels of redundancy. Throughout, I've seen the performance degradation from overlapping long rebuilds cause system-wide performance to drop below acceptable levels. Higher areal density won't improve rebuild times unless internal transfer time is the bottleneck (it's not), and it very much does matter if rebuilds take a month. If that additional capacity isn't accompanied by proportional amounts of external-interface bandwidth and CPU/memory somewhere, then bigger disks will mean more risk of data loss. The math is unforgiving.

1 comments

effie 1922 days ago

> In a large enough system, over a long enough time, even rare failure modes become inevitable.

Of course rare failures and loss of data do happen. There is no storage strategy that prevents these with certainty.

Data loss and performance degradation should be expected and designed for. Maybe RAID6 isn't cutting it for petabyte projects, but it is fine for vast majority of RAID users (small businesses, <12TB arrays).

I've noticed that special hardware and design requirements of the few largest operators are somehow proselytized as a standard that everybody should adopt. People just like to talk about how they understand the biggest deployments in the worlds and how that is the best practice for everybody. But for most users of RAID, these bigboy strategies are irrelevant. Arrays below 12TB are very common and work acceptably well with RAID5 / RAID6, and occasional stripe failure very often isn't a big deal for home users or small businesses.

> Higher areal density won't improve rebuild times unless internal transfer time is the bottleneck (it's not), and it very much does matter if rebuilds take a month.

Why? It matters only if running in degraded state poses performance/reliability problems to users. Which means the array wasn't designed with proper redundancy and performance in the first place. That is the problem, whether rebuild takes a day or a month. Large drives 100TB will be fine if enough of them is used in the array so it works well in degraded state. Also, most probably URE rate will go down due to better ECC measures with 100TB drives.

link

notacoward 1922 days ago

> Large drives 100TB will be fine if enough of them is used in the array

So one one hand you say that "big boy stuff" doesn't matter to anyone else, but on the other you say that "proper redundancy" requires higher scale. Seems a bit Goldilocks-ish to me, or perhaps even a bit slippery. There's a pretty well established trend, especially in storage, of things that happen in large systems becoming very relevant to smaller ones over time. RAID itself was considered a super-high-end niche once. And don't assume that my knowing about the high end means I don't know the low end as well, or make appeals to authority on that basis. Rebuild times have always been an issue worth addressing, from 1994-95 when I was working on the then-highest-density disk array (IBM 7135/110) to now, from high-end HPC to SOHO. Don't act like you occupy some magical space where what's true everywhere else is not true as well.

link

effie 1922 days ago

Regarding "bigboy stuff", it is really a simple argument, let me repeat in simpler words. Extreme data reliability beyond RAID6 is important for some specific deployments where loss of data is unacceptable, say for a unique experiment at CERN or a long supercomputer job that can't be repeated. But such strategy is also needlessly costly for other, less critical RAID users. The latter group of operators is many times bigger and this is often not reflected in these "RAID5/RAID6 is obsolete" discussions.

I agree with you that in time, the high-end tech becomes the standard tech. But that takes some time. There is quite a non-magical space of small providers who do not care for super reliable storage or super fast rebuilds and this will be the case for a long time. Yes the faster the rebuild the better, and "it is a concern" is fine. One week or month rebuild can be lived with. There is nothing magical about one day, one week or one month. They are all very short compared to typical drive lifespan.

At the same time, yes I believe 100TB drives, if they come, will be used in those extremely reliable big deployments, simply because of better TCO and expansion of data. Even if rebuild times will be longer than today, I believe it can be made to work reliably.

link