Hacker News new | ask | show | jobs
by johnduhart 1717 days ago
I think you need to re-adjust your expectations, it's not reasonable to have a fully fleshed out RCA blog post available within hours of incident resolution. Most other cloud providers take a few days for theirs.
2 comments

I mean, not an RCA per se, but info more akin to cloudflare's blog post would be v welcome IMHO: https://blog.cloudflare.com/october-2021-facebook-outage/
Both posts have essentially the same info - the fb one just didn't include an explainer on how the internet works.
The Cloudflare post includes graphs depicting how the issue looked downstream and a rough initial timeline for the incident. The Facebook post says basically nothing more than that there was a networking mistake.

I hope we get something more substantial and informative than that over the next couple of weeks, but it doesn't seem (at least from my searching) that Facebook is in the business of publicly posting in-depth post-mortems for their outages, which I personally find unfortunate.

Cloudflare can scratch the surface of the issue it wouldn't matter, it is a content marketing piece after all. Facebook, otoh, needs to be thorough.
How tech savvy are people who pay Cloudflare money?

vs

How tech savvy are people that Facebook profits from?

Gotta target your audience, every communication is PR...

It’s not reasonable to demand any details at all, it’s nice of them to notify people of what went wrong but it really is none of our business.
On the off-chance this isn't sarcasm, Facebook's routing shenanigans slowed down the entire internet. Not to mention that they're a publicly traded company, and one which has gone out of its way to assume an infrastructure role. They don't have a right to privacy here, and we are all owed an explanation.
I don't want an explanation nor do I care, Facebook could disappear tomorrow like all the other networks before it and it wouldn't make a dent in my day.
Doe you honestly believe that Facebook and its subsidiaries don't have a major impact on the world?
Whether we like it or not, all three platforms are relied upon by hundreds of millions of people and businesses everyday for communication.

I'm sure the world would quickly adapt by re-adopting these things called "websites" and "email" but in the meantime, it's highly self-centered to think this "didn't matter".

Okay? Why are you commenting here then?
Or else? You will angrily stamp your foot? Start an e-petition?
This comment thread is about whether we ought to receive an explanation, not the practical likelihood of getting one.
> Facebook's routing shenanigans slowed down the entire internet

This is Hacker News, so the distinction between network performance, server performance and application performance should matter.

"The Internet" did not slow down. "The Internet" infact probably had more available capacity as a result of Facebook's outage, as all those bits of outrage and cats ceased to be transferred for the duration.

Some applications may have seen performance hits, as a result of poorly thought out dependencies on an external service without graceful failure.

Some applications may have seen increased load and suffered due to server resourcing constraints, caused by applications like the above failing to fail gracefully, and instead polling more aggressively.

> They don't have a right to privacy here, and we are all owed an explanation

Morally / ethically, you're right. The fact that Facebook exists in it's current form tells me that morals and ethics aren't particularly important to the real world.

Perhaps you're unaware that billions of devices attempting to resolve Facebook's unresolvable domains effectively DDOS-ed the DNS system? It most certainly did slow down big chunks of internet which otherwise had nothing to do with Facebook.

https://www.theverge.com/2021/10/4/22709123/facebook-outage-...

> Some applications may have seen increased load and suffered due to server resourcing constraints, caused by applications like the above failing to fail gracefully, and instead polling more aggressively.

I had Cloudflare's woes in mind when I wrote that.

So your point is that it didn't slow down the entire Internet, only the parts of the Internet that use DNS (damn near all of them)?