|
|
|
|
|
by craigyk
4293 days ago
|
|
My largest ZFS pool is currently ~64TB ( 3 X 10 3TB (raidz2) )
The pool has ranged from 85%-95% full (it's mostly at 85% now and used mostly for reads). Resilvering one drive usually takes < 48 hours. Last time took ~32 hours. Something else cool:
When I was planning this out I wrote a little script to calculate the chance of failure.
With a drive APF of 10% (which is pretty high), and a rebuild speed of 50MB/sec (very low compared to what I typically get) I have a 1% chance of losing the entire pool over a 46.6 year period. If I add 4 more raidz2 10X3TB VDEVS that would drop to 3.75 years. |
|
I'm always mystified at how stupid our storage systems are. Even very expensive SAN solutions from EMC and the like area just... stupid. We've got loads of metrics on every drive, but figuring out that those things should be aggregated and subjected to statistical analysis just seems to have not been done yet.
What I really want is a "pasture" system - a place I can stick old drives of totally random sizes and performance characteristics and have the system figure out where to put the data in order to maintain a specific level of reliability. Preferrably backed by an online database that tracks the drive failure rate of every drive on every deployed system, noting patterns in 'bad batches' of certain models and the like. If one of my drives would have to beat 3-standard-deviations odds to survive for the next week, move the damn data to somewhere better. And if you've got 2 150GB drives and 1 300GB drive, then each block on those drives has a rating of 2.0 - adjusted for the age and metrics of the drive.
Oh well, maybe when I retire in 30 years storage systems will still be as stupid as they've remained for the past 30 years and I'll have another project to add to the pile I don't have time to work on now.