Hacker News new | ask | show | jobs
by jrnkntl 3926 days ago
cmd+f redundancy, availability, durability, 99: 0 results.

Only on https://www.backblaze.com/b2/why-b2.html it is that I can cite the following: "the B2 Cloud Storage service has layers of redundancy to ensure data is durable and available". What that exactly is or what it translates to is nowhere to be found. If you want corporations or developers to use your storage services for their precious data, I'd be a bit more specific.

3 comments

Brian from Backblaze here: We're really transparent about our redundancy. We use 17+3 Reed Solomon across 20 computers in 20 different locations in our datacenter. You can read about it here: https://www.backblaze.com/blog/vault-cloud-storage-architect...
I have read plenty of posts from Backblaze in the past including the linked post, but I admit I also wanted to see details about the replication factor on the marketing site for B2.
Compared to S3, which while they say 11 9s of durability, they only commit to 99% uptime on a monthly basis (98% for their new S3 intermittent availability offering):

https://aws.amazon.com/s3/sla/

They actually commit to 99.9 and 99 respectively after which they begin to give credit back. They credit back at a higher rate after 99 and 98.
brianwski below shares a good link...but you make a good point jrnkntl that we should talk more about it on the site. We'll plan to add more content around that on the Why B2 page. Thanks!
Please go into detail about how failures are detected and handled – e.g. how often is the archive scrubbed, will bitrot be detected on access, etc. Those details are really important for comparing services.
This is a very important point. A lot of these products gloss over those "details". Because the numbers might not look good.

E.g. on B2 if you wanted to retrieve data to do your own scrub/validation it would cost you the equivalent of 10 months of storage just to do one retrieval: $0.005/GB/month to store, $0.05/GB to download.

Google Cloud Storage Nearline has the same problem: $0.01/GB to store, $0.12/GB for egress. But at least in this case you can egress for free to Compute Engine, so you would only need to pay $0.01/GB for retrieval.

So it's not possible (at reasonable cost) to do your own validation of what's stored in B2. In Google's case, as long as you're willing to use their cloud computers, validating your data once a month doubles your cost.

In conclusion, you're trusting the vendors to handle failures, it's very expensive to check your data yourself.

Does AWS provide this info for S3? I believe they do not, and only provide a percentage of durability maintained.
I'm definitely asking because those are questions to ask every vendor. Don't forget, also, that this space includes things like running your own services using Swift, Ceph, etc. so it's certainly possible to answer definitively those questions for at least some other options.