|
Yes, you pay a cost for latency, but you get a phenomal amount of durability at much lower stretch factor. If they make sure they no two shards occupy the same hard disk, they could lose up to three hard disks with your data shared on it and still be able to recreate it. Even if they lose just one, they can immediately reproduce that now missing shard from what they already have. So really you'd need to talk losing 4 hard disks, each with a shard on, nearly simultaneously. So that's roughly the same durability as you'd get storing 4 copies of the same file. Except in this case it's storing just 1.15x the size of the original file (20:17 ratio). So for every megabyte you store, you need 1.15 megabytes of space instead of 4 megabytes. The single biggest cost for storage services is not hardware, it's the per rack operational costs, by a long, long stretch. Erasure encoding is the current best way to keep that stretch factor low, and costs under control. If you think about the different types of storage needs there are, and access speed desires, it's even practical to use much higher ratios. You could, for example, choose 40:34 and get similar resilience to as if you had 8x copies of the file, while still at a 1.15x stretch factor. You just have that draw back of needing to fetch 34 shards at access time. If you want to keep that 4x resilience that could be 40:36 which nets you a nice 1.11x stretch factor. If you had just 1 petabyte of storage, that 0.03 savings would be 30 terabytes, a good chunk of a single server. |