Hacker News new | ask | show | jobs
by thanksforfish 2262 days ago
This is a good point, but it could be worded better.

One of the things a site reliability engineer should think about is how well the site can be operated when dependencies have issue. After an incident like this, even if you were able to recover, it's worth thinking about how things could have gone better.

In the past I had a painful experience with one application I was supporting that needed to install NPM packages on deployment. We couldn't successfully deploy (or scale up) for the duration of that outage. In that case we realized it was safer to switch to server images with all assets pre-installed and an NPM cache to give the build a better chance of succeeding. The next NPM outage we only noticed after the fact :)

Not certain how this particular deployment pipeline is failing due to the GitHub outage, but a post-mortem to discuss may be helpful and protect against future issues.

1 comments

To your point on NPM dependencies, CDNs for JS libraries is another thing that I recently learned is common practice, which I don't fully understand. It seems like if the CDN goes down then your application stops working, but I am convinced I'm missing something here because it seems like such poor engineering judgement. This seems to be really common in SPAs, which is precisely where (it seems to me) you shouldn't be using a CDN.

It seems the convenience of cloud based deployment pipelines is not really worth situations like this.

People do it because it is faster (limit of requests per domain) and cheaper.
Specifically for the javascript frameworks, the use of a popular CDN increases the chance that the browser will have the asset in the cache already. Browser cache is a huge win for load times.

Static file hosting via a managed CDN is a fairly reliable option, better than many companies can build on their own.