|
> On January 4th, one of our Transit Gateways became overloaded. The TGWs are managed by AWS and are intended to scale transparently to us. However, Slack’s annual traffic pattern is a little unusual: Traffic is lower over the holidays, as everyone disconnects from work (good job on the work-life balance, Slack users!). On the first Monday back, client caches are cold and clients pull down more data than usual on their first connection to Slack. We go from our quietest time of the whole year to one of our biggest days quite literally overnight. What's interesting is that when this happened, some HN comments suggested it was the return from holiday traffic that caused it. Others said, "nah, don't you think they know how to handle that by now?" Turns out occam's razor applied here. The simplest answer was the correct one. Return-from-holiday traffic. |