Hacker News new | ask | show | jobs
by kachapopopow 223 days ago
because these systems are so big and the people who can validate problems might be asleep at the wheel or be pretty far up the chain and it takes time to reach it. most of the spikes on downdetector are often unrelated to the service, but a 3rd party failure.
1 comments

IMO if you have an endpoint or service on your status page, you most definitely have an oncall rotation for it. Regarding the second point, your service might be down due to an AWS outage. It's an upstream issue and I fully understand that but I should not have to track things upstream by guessing what cloud provider your use. Where do we draw the line too? What if its not AWS but Hetzner or some other boutique provider?
well usually you have no way to even validate the issue if is due to a bad route and giving out an inaccurate status report is poorly reflected on a pristine:tm: status page. also status updates send out (in some cases) millions of notifications so (global) notifications are only reserved for P0 type issues.