|
|
|
|
|
by HelloNurse
245 days ago
|
|
It's the most plausible, fact-based guess, beating other competing theories. Understaffing and absences would clearly lead to delayed incident response, but such an obvious negligence and breach of contract would have been avoided by a responsible cloud provider, ensuring supposedly adequate people on duty. An exceptionally challenging problem is unlikely to be enough to cause so much fumbling because, regardless of the complex mistakes behind it, a DNS misunderstanding doesn't have a particularly large "surface area" for diagnostic purposes and it is supposed to be expeditely resolvable by standard means (ordering clients to switch to a good DNS server and immediately use it to obtain good addresses) that AWS should have in place. AWS engineers being formerly competent but currently stupid, without organizational issues, might be explained by brain damage. "RTO" might have caused collective chronic poisoning, e.g. lead in drinking water, but I doubt Amazon is so cheap. |
|
You seem to be misunderstanding the nature of the issue.
The DNS records for DynamoDB's API disappeared. They resolve to a dynamic bunch of IPs that constantly change.
A ton of AWS services that use DynamoDB could no longer do so. Hardcoding IPs wasn't an option. Nor could clients do anything on their side.