The great e-mail collapse occurred in the late 1990s and early 2000s when spam made it prohibitively difficult for most users to actually run federated e-mail endpoints. At this point it's an amazing pain to the point that very few attempt it. The rest is moot.
That's the problem with federated protocols. Without someone who owns the system and who has the resources and central authority to police it, if it becomes popular it will be destroyed by spam and other abuse. (Effectively a sybil attack.) Self-policing protocols (without costly proof of work) are an "AI-hard" problem since your adversary is the human intelligence of the protocol's exploiters.
Niche federated protocols avoid this fate by never becoming popular. The other way to avoid this fate is to impose a severe work function like Bitcoin and other block chains, but this is too expensive (figuratively and literally) for most applications. Could you imagine a forum software that requires a minimum of several hundred watts of power to participate in the network?
Alternatively if other/future federated protocols are developed with spam and abuse avoidance in mind, then perhaps implementation of any associated tech stacks could be developed that would be easier for the lay person to stand up/rollout. I'm not saying protocol devs should by default be able to tell the future...but if they go in with abuse-avoidance AND ease of rollout in mind, then that would go a long way for federated-style platform...Or, maybe I'm just being overly optimistic! ;-)
Well fortunately with the major influence Gmail has in the personal email and work email space these days, your wish is somewhat true.
Having said that, Google certainly run many many mail servers, as such an outage that impacts delivery for one group of people does not necessarily impact everyone. This is the difference between robust systems and those with critical lynch pins that create system wide outages.
If our email is down but our customers' email isn't down, then they think we're ignoring their "urgent" messages, and that's bad.
But if everyone's email is down, they won't be able to send those messages, so they won't think we received them to ignore them.
(Then again, email is kind of uniquely bad here, because of the way SMTP works as a store-and-forward protocol. In most protocols, if my server went down, your client wouldn't be able to connect to it, so it'd be pretty clear something is wrong. With SMTP, your client can just "put a message on the Internet" for my server to receive, and won't know for 48 hours or more that my server isn't there any more.)
I meant total downtime for each user. So if I lose email for an hour, and my worker loses email for an hour, it is better if those are the same hour, otherwise there are two hours when we can't email each other.
I wonder where the sweet spot is between a centralized service like Slack, and extremely decentralized scenarios where people just try to cope with a multitude of one-to-one channels?
Federation is a given for this hypothetical sweet spot of course, but how do you find the spot? Are there any HN readers who can point me to research in this area?
djb half-proposed IM2000 as a replacement for SMTP and POP and IMAP.
Briefly: you send a message from your user agent (mail program) to your own mail server. Your mail server sends a notification to all the destination mail servers, a notification basically consisting of the headers.
The destination mail server lets the recipient know that the notification has arrived, possibly doing filtering and sorting and prioritization and stuff.
The recipient fetches the mail body from the originating mail server, and then does whatever.
The big change here is that the notifications are store-and-forward but the mail itself is not. The originating mail server needs to be up and functional in order to get a message body delivered.
Spammers are severely impeded: the message body can't be sent unless they have a reliable, traceable machine up when people get around to reading mail. Botnets won't work. Yet anybody who can run a reliable server can run their own mail server.
Mailing lists only send the full body to people who request it. Unsubscribe is actually worthwhile for any legitimate company to implement. Mailing list servers can easily implement archives by just keeping mail available.
And finally, the holy grail of Outlook users is actually implementable: you can cancel an email after you sent it and have that actually work, as long as people haven't pulled the body down yet.
Federation works just fine - the problem is getting "non-techies" to care enough to join in, discoverability, and ease of joining. So they almost all suffer from lack of a social network and are used by only a niche community.
People will use what most of their friends use. If their friends and people they want to follow all use Twitter, why would they use GNUSocial? Answer is they won't.
There's been occasions where due to a massive routing snafu the whole internet became unusable. Sprint once botched their backbone so badly it started blackholing all traffic, and on another occasion tons of major routes were inexplicably shifted to China.
The top-level routing is held together with duct tape and lots of carefully trained eyes looking for problems with it.
Slack is down, productivity goes up... after using it for a while I find it does communications within a team well but does almost nothing to increase collaboration, so I guess slack is in area which apps like Discord will disrupt.
Discord's a compelling alternative, but it's just another flavour of same. If you want distributed you need something more like IRC or at least federated XMPP.
Eh, I don't think a post-mortem is really necessary. They said it was a broken code change on the status page. If they wrote a post-mortem, it'd probably just say "A team member forgot to foobar the bazqux when deploying an update. We immediately followed our playbook for rolling back a failed deployment and restored service within 9 minutes."
Given the increasing dependence on messaging platforms, I think any widespread downtime deserves a public post-mortem.
And that's not to speak to the amazing amount of curiosity and interest that any downtime in a large public system generates. From the PR side, I would think that some kind of post-mortem is almost necessary to prevent that curiosity and interest from turning to distrust and negative perception.
I've been tempted to stop using it, but instead just shut off notifications. I really like being able to pull it up quickly to see when code has been pushed up, pull requests created / merged, or whatever automated CI action is going on for a particular project.
The chatter can burn a lot of time though. You're absolutely right there.
We solved this by creating a separate channel for gifs and news. Those interested can participate, those not interested can just mute the channel and ignore it.
Working as a publisher and having channels to talk to each of our clients is amazing and increases productivity and communication way beyond just email and phone. But that's just my particular use-case :)
That's why you didn't hear about the great email collapse of 2006.