What it means is in any given year, you have a 1 in 10,000 chance that a data loss event occurs. It doesn’t stack like that.
If you had light bulbs that lasted 1,000 hrs on average, and you had 10k light bulbs, and turned them all on at once, then they would all last 1,000 hours on average. Some would die earlier and some later, but the top line number does not tell you anything about the distribution, only the average (mean). That’s what MTTF is; the mean time for a given part to where it has a greater likelihood to have failed by then vs not. It doesn’t tell you if the distribution of light bulbs burning out is 10 hrs or 500 hrs wide. it’s the latter, you’ll start seeing bulbs out within 750 hrs, but if the former it’d be 995 hrs before anything burned out.
Object integrity isn’t part of the S3 SLA. I assume that is mostly because object integrity is something AWS can’t know about per se.
You could unknowingly upload a corrupted file, for example. By the time you discover that, there may not be a clear record of operations on that object. (Yes, you can record S3 data plane events but that’s not the point.)
Only the customer would know if their data is intact, and only the customer can ensure that.
The best S3 (or any storage system) can do is say “this is exactly what was uploaded”.
And you can overwrite files in S3 with the appropriate privileges. S3 will do what you ask if you have the proper credentials.
Otherwise, S3 is designed to be self-healing with erasure encoding and storing copies in at least two data centers per region.
Yes but my point stands. If AWS added S3 data integrity to the SLA then it’s now made that commitment contractually. If you add checksum data the checksums would (logically) be required and also be in scope of the SLA. If there was a mismatch between them and the file functioned it would be impossible to sanely adjudicate who is responsible for the discrepancy, or what the nature of that discrepancy might be if no other copies of the file exist.
AWS probably doesn’t want those risks and ambiguities.
I generally see object storage systems advertise 11 9s of availability. You would usually see a commercial distributed file system (obviously stuff like Ceph and Lustre will depend on your specific configuration) advertise less (to trade off performance for durability).
In general if you actually do the erasure coding math, almost all distributed storage systems that use erasure coding will have waaaaay more than 11 9s of theoretical durability
S3's original implementation might have only had 11 9s, and it just doesn't make sense to keep updating this number, beyond a certain point it's just meaningless
Like "we have 20 nines" "oh yeah, well we have 30 nines!"
To give an example of why this is the case, if you go from a 10:20 sharding scheme to a 20:40 sharding scheme, your storage overhead is roughly the same (2x), but you have doubled the number of nines
So it's quite easy to get a ton of theoretical 9s with erasure coding
it's really not that impressive, but you have to use erasure coding (chop the data D in X parts, use these to generate Y extra pieces, and store all X+Y of them) iso replication (store D n times)
If you had light bulbs that lasted 1,000 hrs on average, and you had 10k light bulbs, and turned them all on at once, then they would all last 1,000 hours on average. Some would die earlier and some later, but the top line number does not tell you anything about the distribution, only the average (mean). That’s what MTTF is; the mean time for a given part to where it has a greater likelihood to have failed by then vs not. It doesn’t tell you if the distribution of light bulbs burning out is 10 hrs or 500 hrs wide. it’s the latter, you’ll start seeing bulbs out within 750 hrs, but if the former it’d be 995 hrs before anything burned out.