Hacker News new | ask | show | jobs
by pythux 805 days ago
The status page says it's been resolved 8 minutes ago (Apr 05, 2024 - 08:48 UTC): https://www.githubstatus.com/incidents/bnkkbj90yhz6

But it definitely still happens now (500s on refresh on PRs and GitHub actions)

Edit: still ongoing

Edit 2: still ongoing at Apr 05, 2024 - 08:56 UTC (keeping updated for the record since their status page cannot be trusted apparently)

Edit 3: I see they have switched to a different (ongoing) incident ID now: https://www.githubstatus.com/incidents/5ly0psff2s5d

2 comments

Status pages can never be trusted.
I recently set up a status page for the services I run on my pi. The idea was to get some insights and apply experience at work.

My experience now tells me what we really need first is a solid alerting system, the status page can't be trusted. It's a PR tool (a useful one), not a sysadmin tool.

If the status page officially authorized by the company’s management team cannot be trusted, then why trust the company in the first place?
Is the company in the business of selling status pages ?
If they market it as a feature,yes?

Were you expecting me to say something else?

> If they market it as a feature,yes?

Well, the status page is not the main feature of github/gitlab but I agree that if customers decide to rely on it then it's a problem.

> Were you expecting me to say something else?

Yeah, something like "their status page is not their core business" or "status pages and SLA are two distinct things".

I always think this should be an incident in itself - why did our status page not reflect the reality of a degraded service? It's so common that they don't, and something user-driven like DownDetector is often more reliable
I don't think an accurate and automated public status page is something any management would want. If it was accurate they wouldn't be able to lie to customers about the uptime. So I always suspect status pages are adjusted manually.
That's exactly what happens. How we need to respond though is by not linking to status pages hosted by that party, instead we should be linking to a StatusGator or DownDetector page as a 'source of truth'.
> why did our status page not reflect the reality of a degraded service?

There was a conflict with marketing, market movement, sla contracts, and our image.

What about something like https://heiioncall.com/status (disclosure: helped build it) which gives a real-time view into what our monitoring & alerting system sees, both from various HTTP endpoint checks, and cronjob checkins?

But I don’t think most orgs would want a public version of this (too much transparency), which is why we haven’t built that.

=> Third-party sites can never be trusted.
=> Nobody can never be trusted
=> I think, therefore I am. Everything else is speculation.
=> Certainly is.
This is a first-party site.
This is the other-party site. Could be first in a sense but inevitably the other.
Huh? This site is part of Github. It is on a different domain (presumably) so that if there is a DNS outage it will not be affected.
Initially, perhaps by different teams, three separate incidents were created. Since then, two have been resolved, and one continues to be updated.