Hacker News new | ask | show | jobs
by discordianfish 211 days ago
Indeed, nothing about the root issues are particular surprising but why they missed a critical service panicing across their fleet is not bubbling up.

My best guess is too many alerts firing without a clear hierarchy and possibilities to seprate cause from effect. It's a typical challenge but I wish they would shed some light on that. And its a bit concerning that improving observability is not part of their follow up steps.