|
|
|
|
|
by lngarner
1206 days ago
|
|
Hi! Thanks for asking. Basically, Status pages get updated manually, and people decide whether and when an outage is sufficiently bad to warrant a status page update. We monitor actual functionality and will capture smaller glitches that either escape human attention altogether or never get escalated to the point where the status page is updated. In more detail, this can be for three reasons:
1.) We use functional testing so we're simply showing what aspects of the platform are working and what's not. Due to definitions of "outages" and such in SLA's, vendors like Datadog might not disclose/categorize certain dysfunctions as outages and so they won't show them on their status page. In other words, some outages might be more "minor" and they won't include them on the status page.
2.) Status pages are manual, Metrist is automatic. DD might not have updated or even be fully aware of the outage. Our tests are just showing the objective data as it's happening.
3.) Everyone experiences outages differently. This data from the demo is Metrist's experience with Datadog and can be slightly different from other people (another reason why status pages can be vague). That's why we have an orchestrator that allows people to set up personalized monitoring so they can know exactly how a vendor is affecting them in real-time. And if an outage is relevant to and affecting them. Does that answer your question? LMK if I can follow up with more info. :) |
|
This bugs me to no end. I don't want to name names but I had a devops service that was returning an odd error implying I was doing something wrong. Status page said everything was good. After several hours I emailed to be told it was actually down, they were aware, and were working on it. It eventually gets fixed, they email back, and all is well. The status page never did show any downtime.