|
|
|
|
|
by parasubvert
5566 days ago
|
|
Generally speaking this is the sort of thing that people warn about when they say "if you want to run on a cloud, you need to design your application for a cloud". Meaning, you can't presume your infrastructure is dedicated and carries similar MTBFs of (say) an enterprise hard drive, which upwards of 1 million hours. Amazon provides plenty of opportunities to mitigate for this, such as providing multiple availability zones. Reddit, if you read the original blog post, wasn't designed for that - it was designed for a single data centre. OTOH, the variability of EBS performance is true, and frustrating. If you do a RAID0 stripe across 4 drives, you can expect around sustained 100 MB/sec in performance modulo hiccups that can bring it down by a factor of 5. On a compute cluster instance (cc1.4xlarge) it's more like up to 300 MB/sec if you go up to 8 drives, since they provision more network bandwidth and seem to be able to cordon it off better with a placement group. |
|
The comments on reddit indicated hiccups more on a factor of 10x and, sometimes, 100x.
Either way, the issue is that the more drives you add to your RAID0, the more often one of those drives experiences a "hiccup," and kills the performance of the entire volume.