Hacker News new | ask | show | jobs
by fiveoak 2619 days ago
It's annoying how often status pages for various services (when they even exist at all) show that things are working when they really aren't
5 comments

I want status pages to show traffic to the status page itself over the last 24 hours (or some time period). A sudden uptick in traffic but green across the board would indicate that there is an issue, but they just haven't updated the page yet.
Yes! I have yet to convince anyplace I've been working to do this, but I do make increases in traffic volume to status pages generate non-critical alerts. Has caught out a few problems much earlier than they otherwise would have been discovered.
Sounds like an interesting way to make developers (or ops) life hell. Just point a traffic cannon at the status page and bingo bango non-critical page.
If you can write a perfect automated status page - can't you basically write perfect integration tests and make sure no bad code gets deployed?

Bit of a chicken and an egg problem :)

Do you write every single test for your codebase? Do you have control over every line of code that gets deployed? The purpose of monitoring is to detect issues, because preventing them entirely is next to impossible.

By your logic if you can write perfect integration tests, can't you write perfect application code that doesn't need to be tested?

I think you've missed the point.

OP is arguing that a perfect codebase is not possible, therefore it's a bit unfair to complain that the status pages do not work perfectly. Hence the chicken and egg problem of "if I could write a perfect status page, I would have skills such that the status page would not be necessary".

There are two ways to have a status page.

1) A guy with a button. It's right, but it requires the guy to notice or get told - and remember to push the button.

2) Automation. But your automation has to somehow catch bugs that you can't even dream of. And not be triggered randomly. And somehow be able to be automated but not fixed at the source (as if you know how something will fail - why not just fix that over adding more status checks?)

That may be true for small failures, but the AWS Status Page has routinely failed to report large scale outages.
It's bc they're manually updated.
I've always wanted to build a public status page for services like Google Cloud and AWS with real tests against different services they offer. Not sure there is a good way to monetize though.
> StatusGator monitors the service status pages of more than 410 cloud services.

It looks like they are just an aggregator for status pages. I want something that does its own testing to determine if a service is up or down.

I've always wondered about this strategy. I'm a developer and a business owner. I'm in AWS's core target demographic. I know three things about AWS:

    - Their products have weird names
    - Their web console has horrible UX
    - The lie about uptime and server status
I'm not looking to switch to AWS soon.
I take it you have never used Azure's UI? AWS is much better. I can't comment on Google Cloud because I've never used it.
Some of the AWS UX is okay but the parameter store UX for the web console is some of the worst I've seen. It's difficult to navigate and paginate, and the search function is barely worth using. It would be amazing if they had fuzzy search on the names.
I do agree that the CloudWatch console search is like this, where you need to type out the entire string from the beginning to get it to match. But for the CloudFormation and Lambda consoles, I can just type a piece of the string I'm trying to find and all matching will just show up.
The UX is a bit poor and there are long standing issues like the S3 console timeout bug and the SWF console not obeying any known law of web design but serious teams don't really use the console, they use tools like Terraform, CloudFront or they roll their own automation.

I use the AWS APIs almost every day but probably log in once a week or so.