For starter, failure modes are not decoupled at all. If one db hits a race condition in the storage driver on one instance, it's usually 100% guarantee that all other instances are sensitive to the same bug and it will happen sooner or later.
In practice, occurrences of a bug are highly correlated and usually happen in batch.