Hacker News new | ask | show | jobs
by dilyevsky 2779 days ago
We had an issue a few weeks back where all nodes in west1-a could not pull docker images. Google support was pinballing P1 issue around the globe and across multiple teams for a few days untill I root caused it for them - turned out to be gce service account issues affecting entire zone. 2 days to rollback (no status page update). I know nobody gives a fuck but can’t help but feel vindicated as an ex google sre.
1 comments

I think a lot of people give a fuck here; I do, at least. Thanks for outlining it, these things are fascinating (to me anyway, who has never worked in IT/ops).