|
|
|
|
|
by Joker_vD
894 days ago
|
|
Does it mean that when Blackblaze needs to retrieve me my file, it has to issue 20 parallel network requests, wait for at least 17 of them to complete, then combine the responses into the requested file and only then it can start streaming it to me? That seems kinda bad for latency. |
|
If they make sure they no two shards occupy the same hard disk, they could lose up to three hard disks with your data shared on it and still be able to recreate it. Even if they lose just one, they can immediately reproduce that now missing shard from what they already have. So really you'd need to talk losing 4 hard disks, each with a shard on, nearly simultaneously.
So that's roughly the same durability as you'd get storing 4 copies of the same file. Except in this case it's storing just 1.15x the size of the original file (20:17 ratio). So for every megabyte you store, you need 1.15 megabytes of space instead of 4 megabytes.
The single biggest cost for storage services is not hardware, it's the per rack operational costs, by a long, long stretch. Erasure encoding is the current best way to keep that stretch factor low, and costs under control.
If you think about the different types of storage needs there are, and access speed desires, it's even practical to use much higher ratios. You could, for example, choose 40:34 and get similar resilience to as if you had 8x copies of the file, while still at a 1.15x stretch factor. You just have that draw back of needing to fetch 34 shards at access time. If you want to keep that 4x resilience that could be 40:36 which nets you a nice 1.11x stretch factor. If you had just 1 petabyte of storage, that 0.03 savings would be 30 terabytes, a good chunk of a single server.