Hacker News new | ask | show | jobs
Slack is Down (status.slack.com)
65 points by fjordan 3384 days ago
6 comments

It's harder (but not impossible) to have complete service lossage like this in a federated protocol.

That's why you didn't hear about the great email collapse of 2006.

The great e-mail collapse occurred in the late 1990s and early 2000s when spam made it prohibitively difficult for most users to actually run federated e-mail endpoints. At this point it's an amazing pain to the point that very few attempt it. The rest is moot.

That's the problem with federated protocols. Without someone who owns the system and who has the resources and central authority to police it, if it becomes popular it will be destroyed by spam and other abuse. (Effectively a sybil attack.) Self-policing protocols (without costly proof of work) are an "AI-hard" problem since your adversary is the human intelligence of the protocol's exploiters.

Niche federated protocols avoid this fate by never becoming popular. The other way to avoid this fate is to impose a severe work function like Bitcoin and other block chains, but this is too expensive (figuratively and literally) for most applications. Could you imagine a forum software that requires a minimum of several hundred watts of power to participate in the network?

All others fall to the tragedy of the commons.

Alternatively if other/future federated protocols are developed with spam and abuse avoidance in mind, then perhaps implementation of any associated tech stacks could be developed that would be easier for the lay person to stand up/rollout. I'm not saying protocol devs should by default be able to tell the future...but if they go in with abuse-avoidance AND ease of rollout in mind, then that would go a long way for federated-style platform...Or, maybe I'm just being overly optimistic! ;-)
Strong spam/abuse prevention, useful and accessible for most people, decentralized. Pick two.
Of course, if my email goes down, it does not help me that other's is still up.

In fact, if total downtime is constint, I would prefer they overlap.

Well fortunately with the major influence Gmail has in the personal email and work email space these days, your wish is somewhat true.

Having said that, Google certainly run many many mail servers, as such an outage that impacts delivery for one group of people does not necessarily impact everyone. This is the difference between robust systems and those with critical lynch pins that create system wide outages.

> In fact, if total downtime is constint, I would prefer they overlap.

So the people who are supposed to do work for you also can't work?

If our email is down but our customers' email isn't down, then they think we're ignoring their "urgent" messages, and that's bad.

But if everyone's email is down, they won't be able to send those messages, so they won't think we received them to ignore them.

(Then again, email is kind of uniquely bad here, because of the way SMTP works as a store-and-forward protocol. In most protocols, if my server went down, your client wouldn't be able to connect to it, so it'd be pretty clear something is wrong. With SMTP, your client can just "put a message on the Internet" for my server to receive, and won't know for 48 hours or more that my server isn't there any more.)

I meant total downtime for each user. So if I lose email for an hour, and my worker loses email for an hour, it is better if those are the same hour, otherwise there are two hours when we can't email each other.
And relevant to the market: why Freenode netsplits are more of a joke than a problem.
I wonder where the sweet spot is between a centralized service like Slack, and extremely decentralized scenarios where people just try to cope with a multitude of one-to-one channels?

Federation is a given for this hypothetical sweet spot of course, but how do you find the spot? Are there any HN readers who can point me to research in this area?

djb half-proposed IM2000 as a replacement for SMTP and POP and IMAP.

Briefly: you send a message from your user agent (mail program) to your own mail server. Your mail server sends a notification to all the destination mail servers, a notification basically consisting of the headers.

The destination mail server lets the recipient know that the notification has arrived, possibly doing filtering and sorting and prioritization and stuff.

The recipient fetches the mail body from the originating mail server, and then does whatever.

The big change here is that the notifications are store-and-forward but the mail itself is not. The originating mail server needs to be up and functional in order to get a message body delivered.

Spammers are severely impeded: the message body can't be sent unless they have a reliable, traceable machine up when people get around to reading mail. Botnets won't work. Yet anybody who can run a reliable server can run their own mail server.

Mailing lists only send the full body to people who request it. Unsubscribe is actually worthwhile for any legitimate company to implement. Mailing list servers can easily implement archives by just keeping mail available.

And finally, the holy grail of Outlook users is actually implementable: you can cancel an email after you sent it and have that actually work, as long as people haven't pulled the body down yet.

Federation works just fine - the problem is getting "non-techies" to care enough to join in, discoverability, and ease of joining. So they almost all suffer from lack of a social network and are used by only a niche community.

People will use what most of their friends use. If their friends and people they want to follow all use Twitter, why would they use GNUSocial? Answer is they won't.

You mean the USDoHS email storm of 2007?
>That's why you didn't hear about the great email collapse of 2006.

Wait, is this a joke or was there really a great email collapse in 2006?

A few people lost all their emails on Gmail in 2006. https://techcrunch.com/2006/12/28/gmail-disaster-reports-of-...

There have been other Gmail outages in the past though. My two favorites are:

1. The multi-hour outage in 2009 http://www.cnn.com/2009/TECH/09/01/gmail.outage/index.html?e.... 2. And one where they had to restore data from magnetic tape backups https://gmail.googleblog.com/2011/02/gmail-back-soon-for-eve....

Of course, there are other email providers too and their own outages. But I like Gmail and that's what I follow.

There's been occasions where due to a massive routing snafu the whole internet became unusable. Sprint once botched their backbone so badly it started blackholing all traffic, and on another occasion tons of major routes were inexplicably shifted to China.

The top-level routing is held together with duct tape and lots of carefully trained eyes looking for problems with it.

It was a joke
Slack is down, productivity goes up... after using it for a while I find it does communications within a team well but does almost nothing to increase collaboration, so I guess slack is in area which apps like Discord will disrupt.
I don't disagree with you, but in what way do you think Discord will "disrupt" Slack with regards to collaboration?
Discord's a compelling alternative, but it's just another flavour of same. If you want distributed you need something more like IRC or at least federated XMPP.
I don't think making it distributed solves the productivity problem, though.
Err, I find myself wasting much more time in Discord, music bots etc, hah.
Seems to be up for me again. I wonder if 8 minutes of downtime is enough to warrant a post-mortem blog post.
Eh, I don't think a post-mortem is really necessary. They said it was a broken code change on the status page. If they wrote a post-mortem, it'd probably just say "A team member forgot to foobar the bazqux when deploying an update. We immediately followed our playbook for rolling back a failed deployment and restored service within 9 minutes."
Given the increasing dependence on messaging platforms, I think any widespread downtime deserves a public post-mortem.

And that's not to speak to the amazing amount of curiosity and interest that any downtime in a large public system generates. From the PR side, I would think that some kind of post-mortem is almost necessary to prevent that curiosity and interest from turning to distrust and negative perception.

For a product as widely-used as Slack is, it might at least be interesting.
Of course not.
I stopped using Slack. It is a productivity drain.
I feel like this is about as relevant to this discussion as someone arrogantly interjecting, "Oh I don't own a TV," when talking about Netflix shows.
I've been tempted to stop using it, but instead just shut off notifications. I really like being able to pull it up quickly to see when code has been pushed up, pull requests created / merged, or whatever automated CI action is going on for a particular project.

The chatter can burn a lot of time though. You're absolutely right there.

I didn't have anything to stop notifications last I checked. Had to uninstall the app.
Your show of virtue has been noted by the Ministry of Statistics. Thank you for the data point.
Using MS Teams at the office and I couldn't agree more. It's useful about 30% of the time. The other 70% is sharing giphies and news links.
I wish they had an option to only notify me of images and gifies. I have no idea what the other stuff is about..
We solved this by creating a separate channel for gifs and news. Those interested can participate, those not interested can just mute the channel and ignore it.
Do you only call / talk to people in person at your job? In my experience that is much more disruptive than an instant message.
Working as a publisher and having channels to talk to each of our clients is amazing and increases productivity and communication way beyond just email and phone. But that's just my particular use-case :)
I have two jobs, the one without Slack is much harder to stay on top of.

I just mute the gif-sharing channel.

https://status.slack.com/2017-03/c0923f37c54988ec - basically just said that there was a broken code change which they reverted.
>The status.slack.com was also overloaded during this time and it may have been inaccessible.

This is starting to become a common theme...