I recently set up a status page for the services I run on my pi. The idea was to get some insights and apply experience at work.
My experience now tells me what we really need first is a solid alerting system, the status page can't be trusted. It's a PR tool (a useful one), not a sysadmin tool.
I always think this should be an incident in itself - why did our status page not reflect the reality of a degraded service? It's so common that they don't, and something user-driven like DownDetector is often more reliable
I don't think an accurate and automated public status page is something any management would want. If it was accurate they wouldn't be able to lie to customers about the uptime. So I always suspect status pages are adjusted manually.
That's exactly what happens. How we need to respond though is by not linking to status pages hosted by that party, instead we should be linking to a StatusGator or DownDetector page as a 'source of truth'.
What about something like https://heiioncall.com/status (disclosure: helped build it) which gives a real-time view into what our monitoring & alerting system sees, both from various HTTP endpoint checks, and cronjob checkins?
But I don’t think most orgs would want a public version of this (too much transparency), which is why we haven’t built that.