Hacker News new | ask | show | jobs
by dekhn 1658 days ago
if you cannot access the control plane to create or destroy resources, it is down (partial availability). The jobs that are running are basically zombies.
3 comments

I'm right in the middle of an AWS-run training and we literally can't run the exercises because of this.

let me repeat that: my AWS trainign that is run by AWS that I pay AWS for isn't working, because AWS is having control plane (or other) issues. This is several hours after the initial incident. We're doing training in us-west-2, but the identity service and other components run in us-east-1.

I’m running EKS in us-west-2. My pods use a role ARN and identity token file to get temporary credentials via STS. STS can’t return credentials right now. So my EKS cluster is “down” in the sense that I can’t bring up new pods. I only noticed because an auto-scaling event failed.
We ran through the whole 4.5 hour training and the training app didn't work the entire time.
Seems like the API is still working and so is auto scaling. So they aren’t really zombies.

Partial availability isn’t the same as no availability.

The API is NOT working -- it may not have been listed on the service health dashboard when you posted that, but it is now. We haven't been able to launch an instance at all, and we are continuously trying. We can't even start existing instances.
Depending the workload being run users may or may not notice. Should be Yellow at a minimum.