Hacker News new | ask | show | jobs
by sailingparrot 961 days ago
I never experienced a longer than 12 hours outage with any service provider over my ~13 years career (maybe I was lucky). But thanks to Cloudflare I have been able to enjoy not just one, but two ~24h outages in not even a month!

Jokes aside, it must be extremely stressful to be a SRE at CF recently. But something is clearly wrong over there. We have been burned so bad there is no chance we will touch CF ever again in the next decade once our migration off of it is complete.

3 comments

> But something is clearly wrong over there

We renewed our agreement with them in the middle of the year (~$50k) and they've yet to invoice us for it. Our financial controller noticed and I pinged our account rep a few times. Not a peep back.

My limited interaction with their sales & account management org gave me the impression of remarkable levels of disorganization. I know those tend to have a lot of turnover, but it seemed like they also weren't really training or managing them. Really weird vibes.
> two ~24h outages in not even a month

Wasn't the previous outage on Oct 30 less than an hour?

Yep, but on Oct 9 they were down for 22h.
Trying high for that 2-nines reliability.

You just can't get that level of reliability if you do it yourself, no matter how hard you try.

We won't do it ourselves, but we also won't do it with a provider that has accumulated 50+ hours of downtime in less than a month all the while having no communication or support.

That's barely clearing the one nine availability for the last 30 days (93%) for our particular stack on CF, this is insane.

Mind you last time we were hit by a 22h outage on Oct. 9 we didn't get so much as an email from CF either during or after the outage.

The no communication or support is the real killer here, imo. I can understand them having some catastrophic issues which I would be reasonably confident they could fix, it’s the uncertainty of the situation that makes me worry. Is this so easy to fix it will be back in less than an hour and they will communicate then? Are they going dark and I need to find a new provider asap?
To be fair, their status page says emails don’t work haha
It's long been accepted practice in the hosting industry to have your critical communications as a provider (status page, support system) hosted somewhere that's not your network, for this reason.

It continues to amaze me how major infrastructure providers seem to consistently fuck this one up (see also: AWS' status page outage a while ago).

we used to joke that we had 5 8's of availability