Hacker News new | ask | show | jobs
by dysfunction 1985 days ago
My coworker's theory was someone was waiting for the holiday's end to deploy something risky.

And I'm in that boat of depending on Slack for alerting... in fact my team was also waiting over the holidays to deploy more robust non-Slack-based alerting (in our defense the product is only a few months old and only now starting to scale to any real volume).

3 comments

I wouldn't be surprised if it's actually a combination of a new feature being recently rolled out, along with the sudden spike in load this morning.

The holidays are actually the perfect time for Slack to roll out a risky deployment, as it has to be their lowest usage time. So it would make sense if something was pushed out last week or the week before. And everything probably seemed fine.

And then this morning they suddenly realize this new feature does not perform under load. And to make matters worse, the new feature has been out long enough to make any sort of rollback very tricky, if not impossible. Which means they'd need engineers to desperately hack out, test and deploy a code fix.

If this is the scenario, I do not envy them at all.

Holidays are a good time for a company to do a risky deployment, but a bad time for an individual employee to do a risky deployment, assuming one doesn't want to work overtime over the holiday fixing things.
Depends on how well compensated holiday overtime is. There are some employees happy to work overtime if their hourly pay is doubled or tripled. However there also those who wouldnt do that for any price.
Depends how bad it goes wrong. My org is a 24/7 one, but one Christmas back in the 90s (way before my time) some work was done on Christmas eve, I think it was on the phone system, in the days before widespread mobile phones.

It broke, which was a major problem, this meant that senior management were being phoned (ho), and relatively high middle managers were on site to deal with the fall out. Of course most suppliers were also closed so everything was harder to fix.

There's good reasons not to do changes when places are closed, or at least skeletoned, for 2 weeks.

This depends on how easy/difficult the rollback strategy.
Not a bad theory.

I used to work for a place that had a FY that ended in summer. We had a lot less problems with stuff being shoveled out the door at Thanksgiving and Christmas because nobody was trying to finish their year-end performance goals over the Holidays.

I think what I'm implying is that management creates this issue, but we are complicit.

Yeah, I think it's this rather than load. Slack should be able to handle load fine (probably), but since this is the first weekday post-holidays I imagine some deployment broke something.