Hacker News new | ask | show | jobs
by mmcallister 1865 days ago
I think the _real_ takeaway here should be that your status page should be simple, with as little dependencies as possible, ideally none.

IMHO it should be a static page using Hugo, Jekyll or roll your own if you really have to.

None of this precludes you from using the same domain, though you'd want to use a subdomain, for instance status.product.com

3 comments

You can't avoid all dependencies. Certainly, avoiding a dependency on a complicated application server is good, but somewhere in the mix there has to be a DNS nameserver, an HTTPS server, some kind of persistent storage for the status, and some way for you to update that status. If your status page is a static page, then that just means that your persistent storage is wherever the HTML page itself is stored; an S3 bucket, say, or the filesystem of a machine that's running Apache.

The thing being pointed out in the OP is that, while you can't avoid all dependencies, you can avoid having any shared dependencies (other than core internet infrastructure) between your status page and the service whose status it's reporting on. That way, outage risk will not be correlated between the two, which is generally good enough, since almost no one cares about your status page when your actual service is not experiencing an outage. One effective way to do this is with a service like status.io that specifically hosts status pages and specializes in having very high uptime for just that one kind of page.

If you use the same domain for your status page as for your main service, then that may mean that an outage in your application server, application database, etc., won't affect your status page, but the two will still share a dependency on your load balancer (i.e., whatever your A records are pointing directly at), so if anything goes wrong there then your status page will go down with your main service. If you use a subdomain then there won't necessarily be a shared dependency on the load balancer, but there will be one on the DNS nameservers. The only way to avoid all shared dependencies is to use an entirely separate domain.

I don't know if Discord's problem that the OP is talking about had anything to do with DNS, but I think that's been a source of outages for them in the past, in which case a separate domain is the solution.

The behavior I observed in this particular case is that their server would timeout on requests, especially writes, then come back with a quick 502, then time out again. Their status page displayed the same kind of behavior and the same status page. I wouldn’t be surprised if the issue with the status page is that it might also even share the webservers with their main service.
Indeed, this is the approach taken by the Tor project recently, they are using cstate, which is based on Hugo.

https://blog.torproject.org/check-status-of-tor-services

I thought real takeaway is your status page should be up even if your application is down OR they have to be independent.