Hacker News new | ask | show | jobs
by lucgagan 992 days ago
> Why the focus on synthetic monitoring? As a SRE, I actively eschew synthetic monitoring. It's highly error prone and doesn't actually indicate regional availability. I'd like a status site that I could push a certain internally derived SLA for a given service to and the status site reflects the average over time of that windowed SLA.

As an end user, hard disagree.

GitHub is a great example of this. Their status almost always shows 100% uptime while the service is entirely unstable.

It is clear that their uptime SLAs do not align with end user experience.

As an end user, I care whether I can access and use the service. I don't care what broke in between.

1 comments

I suspect on GitHubs front this has to do with how they populate their status site. They may update it manually once they identify customer impact. If they're using internal metrics to qualify the status site then they're likely not using all of the needed metrics to reflect customer impact. There's also a third possibility which is that between you and GitHub there's something that causes a partition or failure that is outside of GitHub and your domain of control.

I agree with you that the ultimate value is in customer impact. I was saying "that's hard" but synthetic monitoring is not the solution because it doesn't achieve what it sounds like it achieves.