Hacker News new | ask | show | jobs
by everfrustrated 1555 days ago
Does anybody else remember when GitHub's outage page used to have little graphs showing downtime?

Eventually they took it down as their outages were just too often.

GitHub has _always_ had terrible uptime. It's a great product - wish something would change but it seems cultural at this point.

2 comments

They had massive problems with their main database cluster (MySQL). If you read through their engineering blog, most of the outages were related to their growth and the main database cluster. They moved workloads for some features to different clusters, but that's only to buy more time. Eventually they'll do proper shredding (by user or org I guess, not by feature) but that takes time.

Their engineering blog is full of articles about MySQL and the main "mysql1" database cluster, e.g. https://github.blog/2021-09-27-partitioning-githubs-relation...

i've noticed this too .. the real head-scratcher is how a solid chunk of github's db & infra folks left to join a database startup, one of them even becoming its ceo!!

if they had made github db/infra super-stable before this, it would be a vote of confidence in their new company, but instead imho it is the opposite

DB and infra folks are often tasked with shoveling shit uphill, and aren't in total control over how data or schemas get organized.
that's fair. i am just raising an eyebrow to github's apparent lack of sharding, as described in their incident reports -- while these engineers all left to join a db company that focuses specifically on sharding -- it seems like an experience mismatch.

if they were all sharding experts why wasn't github sharded properly. other large mysql shops have solved this, all the way back to the days of yahoo and flickr and livejournal

Which one are you referring to?
maybe i shouldn't have mentioned it, i don't want to name names and have this to come off as an off-topic attack subthread about a different company, sorry! it's a db company that has raised a lot of money and is mentioned on hn a lot, there are only a handful of these
my guess is:

    rot13 cynargfpnyr
I have no idea if this is remotely close to reality but, what if, their culture of breaking things and bad uptime is what allowed them to move fast and build a great product in the first place?
GitHub was founded in 2007. They were acquired by MS years ago. They should be well beyond any startup culture of "move fast at the expense of reliability".
I don't disagree with this, they could/should have transitioned already. But for one, cultures are hard/slow to change. And second, as an example, Facebook had the motto "move fast and break things" until 2014, and by that time they also were beyond the startup phase(), so this kind of culture is not only for early days.

() They were founded in 2004, that's 10 years in. By that time in 2014 they had 800M+ monthly active users and $12 Billion revenue; and they had this culture internally until this point.

Facebook is a social media app that hardly anyone (except for advertisers) pays for.

GitHub is an enterprise product crucial to tons of businesses.

Cultural comparisons between the two really shouldn't apply.

Aren't both companies potentially loosing money when their products don't work? The fact that it's crucial to businesses seems to be the client perspective, not the company perspective. It could also seen as critical for some businesses to advertise on Facebook. This could call for a different culture internally but I'm not convinced this is necessarily the case.