Hacker News new | ask | show | jobs
by tiredofcareer 4790 days ago
> Unless it's becoming a problem, I think anyone would ignore increased latency because they have ten other work tasks to deal with.

You'd be surprised once you start working with larger, higher-traffic infrastructures. If our average external DNS query rises 200ms, my phone goes off. There's more slack on p99, but it's also monitored.

All of the timings for the various parts of a request to the system that I administer are instrumented from a small libcurl app running in multiple ASNs remotely, because Pingdom and other services do not provide the resolution that we need. They are then rendered on a stacked graph that always lives on my third monitor, and any significant deviation averaged out over five minutes catches my eye.

I know it sounds like overkill, but it's crucial at scale.

2 comments

That monitoring sounds awesome. What is it? Is it available to the public? Which method do you use to graph the data?
You monitor your DNS latency, but you don't monitor your delegations?