Hacker News new | ask | show | jobs
by scott113341 2485 days ago
I got paged 50 minutes before AWS updated their status page. We are running on AWS's managed Kubernetes offering (EKS), and about one third of our nodes were running in the affected availability zone. We were then able to move all of or traffic out of that AZ, which solved our issues. The main symptom was HTTP requests made by our backend to 3rd party APIs failing, but only on requests originating from that AZ.