|
|
|
|
|
by matt_oriordan
3410 days ago
|
|
Well it's not that's simple. We do run in multiple availability zones in every region. But if the connectivity between them is partially working, which it was, and shared service from Amazon itself aren't working fully from every instance, ou have a huge mess to contend with where the cluster consensus cannot be formed. So in cases like this we did what we should have done and routed traffic away form a network that was unreliable and partly partitioned. The point for us was not that this availability zone went down at all. It was that Amazon throughout claimed everything was operating normally for hours when this was very far from the truth. |
|
I get that having issues on AWS is irritating; I exist in that ecosystem too. But...I really can't fault them for this, or claim that they're lying. AWS says to not rely on any one AZ being up and/or reachable, and yet you did. And the fact it caused problems for you means you want them saying the entire region is down. Why? They make regions be fault tolerant by having multiple AZs; they guarantee reliability at the region level, not at the AZ level, and that's what the status page is intended to track.
Now, I can see wanting a clear status page per AZ, rather than just a blue 'i'. That's a valid request. But -request- that, don't claim that they're lying. You're being antagonistic despite them doing everything they've promised, and their status page being correct (just not using the colors you would like because they view severity differently than you).