Hacker News new | ask | show | jobs
by teddyh 3477 days ago
Why should we be the ones to look into it? It was a random intermittent short-duration fault in the middle if the Internet, at some unknown place on the then-current path between us and Pingdom. Why should not Pingdom be at least equally as obligated to look into it? After all, they’re the ones actually using the failing connection, in order to monitor our and others’ services. But no, Pingdom simply report us as being down, and leave the hard part to us; i.e. the part where we have to explain to our customers that the Pingdom report is actually provably incorrect.

I mean, what qualifies as “being up”? If some random link in the middle of the Internet goes down, and you suddenly, for 30 seconds, are unreachable for the few hundred people going through that exact link because it happens to be the best path between those people and your server, can they claim that you have failed to provide adequate uptime? If such a fault happens, are you then responsible to troubleshoot it? I say no. The Internet is the ISP’s responsibility, and the only faults actually meaningful to report to your ISP are the repeatable or long-lasting ones. Small stuff like this is not worth anybody’s time (except ISPs) to go digging into.

1 comments

Well if you're not providing a service to others, then you shouldn't be the ones to look into it. But if you're providing a service to users and they tell you it's down then you should. It might be that your ISP has a misconfigured route that is flapping and sometimes causes errors in some locations. Or a netmask is wrong somewhere and certain ip address can't be accessed. It might not be a temporary thing. And you if it's your ISP fault they might be able to fix it.

You've seem to think that you have to investigate the issues. On the contrary, you bump it up to your isp to investigate. If your ISP is regularly having these issues then it might be time to change ISPs to one with a better peering agreement.

If the outages were one of:

1. Reported as being experienced by an actual user of a web site,

2. Longer than a a couple of minutes at most (usually just a few seconds),

3. or happened more frequently than a few times per month,

then I might consider reporting it to my ISP. As it is, it’s not worth it. “Cosmic rays, man.” (https://www.joelonsoftware.com/2001/07/31/hard-assed-bug-fix...).