|
|
|
|
|
by e12e
3635 days ago
|
|
Point well taken - I was confused by the comments about how long 10^27 seconds is, and the "9s" nomenclature which is often used to measure uptime. On the other hand, a Dropbox engineer upthread just claimed that their service have an external [ed2: durability] of 11 to 12 9s. So it does seem that they effectively claim that a block will practically never be unavailable due to not being possible to read from (any) disk [ed2: ie, unavailable due to failed durability]? I do wonder a bit at the cost of padding redundancy up to such a high number. They don't mention block size, other than to say that 1 GB is filled with actual blocks. Lets say it's 1MB, and they target 1 Billion users, averaging 100 GB of data stored. That's 10⁹ users storing 10² GB each with 10³ blocks, or 10⁹⁺²⁺³ = 10¹⁵ blocks of data. That still leaves a lot of margin - and effectively the storage part of the system should never be the weakest link. [ed: and that some users are likely to lose some data, if the 11 to 12 9s figure is to be taken "per block". But maybe it's per user? It seems unlikely that they really mean 11 9s of availability full stop] [ed2: Snuck in an extra availability where I meant durability, rendering also this second comment nonsensical...] |
|
I could make a system that was only available for 1 minute every 10 minutes (10% availability) but never lost a single file (100% durability)
I could also make a system that was never down (100% availability) but would randomly lose 1 in 10 files (90% durability).
Common causes for availability issues are power outages, network outages (fiber cuts, etc), and DNS issues
Common causes for durability issues are software bugs, entire datacenter losses (e.g. earthquake destroys everything), or perhaps coordinated power outages if the system buffers things in volatile memory.