Hacker News new | ask | show | jobs
by bricestacey 4419 days ago
By the time you're talking about 99.9% uptime there is only 9 hours downtime a year so there isn't much wiggle room for failure. There is never a reason any one person should assume they can do better than a third party because you're likely sleeping 2920 hours a year. Basically, you're never good enough on your own. If you care about uptime you're going to need to pay good money for it.
2 comments

I think the general consensus for running infrastructure in house isn't superior uptime but superior control.

Generally, you don't care if GitHub [or your inhouse equivalent] is down when no one is working. Also, if it goes down in the middle of the night and the first guy in fixes it...the difference between 1 man hour down in the morning vs. 20 man hours during the day across an entire team is significant.

There are very valid reasons for both choices.

The feeling of everything is under control...
> Generally, you don't care if GitHub [or your inhouse equivalent] is down when no one is working.

It is fortunate, then, that all GitHub customers are in the same timezone!

Global service should have at least 3 teams (Asia, Europe, America). 4 teams would be better. They can take shifts: Asia team joins the standup meeting of Europe, then goes home. And so on.

It might not be affordable for daily works, but perfect for critical issues.

But, hypothetically speaking as of now, could this be automated? And if so, how?
In short, no, unless you think people replacing broken hardware or patching bugs in your software can be automated.

Your question is too vague. Github is up enough that I don't care. However, it's down enough I wouldn't want to not be able to deploy because it's down. Therefore, I may mirror my repo somewhere else. That's easy because git is decentralized. It's a lot cheaper than running some alternative that I guarantee is always running. You can do this by simply pushing to a mirrored remote branch.

If you're just interested in the subject, research high availability.

Redundancy. Lots of it. A simple example would be hitting your website with a script every so often and when it stops responding then update your DNS to point to another data center that's up and ready to go.