GitHub incident 2022-03-23

Y	Hacker News new \| ask \| show \| jobs

	GitHub incident 2022-03-23 (githubstatus.com)
	273 points by tpaksoy 1548 days ago

22 comments

blueplanet200 1548 days ago

I hope they figure out what’s going on every morning. Heard from inside they don’t know why the db dies everyday but restarting it fixes it.

link

exikyut 1548 days ago

What's "the db"? It sounds like something of small to medium scale if you can just restart it like that.

In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?

Sounds like whatever "the db" is it's probably some (objectively) small but very scary thing that's currently on fire and people are trying to figure out how to put it out without crashing the plane and also making too many waves internally, which is probably even harder. So asking about making vendor noises is (as useful as it may be) probably going down the wrong path - in much the same way this is probably not related to the outages (it may well be, but from the outside it's all coincidence anyway).

link

fundmondawyaya 1548 days ago

Cock crows. DB crashed.

Systemctl restart

mysqld

(Or mariadb, if you pronounce "SQL" as "sequel")

link

yebyen 1548 days ago

Sounds like it was a MySQL database:

https://github.blog/2022-03-23-an-update-on-recent-service-d...

link

shepardrtc 1548 days ago

IIS Server had/has a memory leak in worker threads that many years ago always forced us to restart the server every few days. Starting in 6.0, they added worker thread recycling and made it a mandatory to choose a time period for every thread to be recycled. Why fix the error when you can just restart the service?

link

djbusby 1548 days ago

Apache prefork had that since forever. Seems just a garbage collect type pattern.

link

mst 1548 days ago

For old-school mod_perl apps setting MaxRequestsPerChild was often a much better ROI than actually finding and fixing the leaks.

Speaking as somebody who's done over a decade of large scale OO applications perl and is actually really good at finding and fixing the leaks, this has often been intellectually aggravating but every time I've set that option instead I rewarded myself with a glass of bourbon for picking the pragmatic choice and then went back to adding (non-leaky) features that were far more useful to the company in question than cleaning up the older code would've been.

link

shepardrtc 1548 days ago

It's not a bug, it's a pattern.

Seriously though, IIS 5.0 had no worker recycling. There was no method to fix the issue. Threads would eat up GB's of memory until you killed them.

link

whimsicalism 1548 days ago

I doubt they use IIS

link

throwra620 1548 days ago

MSer here, yes we do… for some things

link

prepend 1548 days ago

For GitHub? It seems unbelievable that they would use IIS pre-purchase and why in the world would you mix in a second web server for post-purchase enhancements.

link

Yuioup 1545 days ago

Why trade an open source solution with third rate garbage that is called IIS which runs on a sub-par desktop OS called Windows. I thought that Github was supposed to be independant.

link

whimsicalism 1548 days ago

If GH is around the same level of integration with Microsoft as my employer, which is another Microsoft acquisition, I don't really believe you have a ton of insight into GH processes.

link

edgyquant 1548 days ago

I dated a girl at GitHub for awhile last year who said they weren’t even completely off of AWS yet and she liked how they didn’t seem like working for Microsoft. Maybe this has changed though.

link

cube00 1548 days ago

Break out the early morning restart cron job.

link

gaoshan 1548 days ago

Here you go, Github:

0 4 * * * /etc/init.d/postgresql restart

I'll take an architect position as compensation, but only if there is equity.

link

rish 1548 days ago

GitHub uses MySQL primarily though.

link

grumple 1548 days ago

MySQL also has a restart command! I'll take my rsus now ty.

link

Kostic 1548 days ago

Early morning in which timezone?

link

afterburner 1548 days ago

GaryOldman.gif

link

glenneroo 1548 days ago

When the least amount of users are online?

link

MuffinFlavored 1548 days ago

How long does restarting it take?

link

raffraffraff 1548 days ago

Yuck. Honestly, restarting a database to fix a major outage sounds like "we have no idea what we're doing"

link

blueplanet200 1548 days ago

It sounds like "they don't know why it's going down." I've worked with plenty super competent people that have taken time to root cause incidents.

Guide to incidents: Step 1: Stop the bleeding Step 2: Prevent it in the future

Doing Step 1 doesn't make you incompetent.

link

raffraffraff 1547 days ago

I'm not a DBA, and maybe you're not a DBA either, so this question goes to DBAs who may be reading: aren't you always better off killing the bad queries instead of rebooting the whole box, if that's an option? (ie: aside from times when the entire host is screwed, load per core is >50, metrics aren't getting out, you can't ssh in etc)

link

bpicolo 1548 days ago

Sporadic database performance issues can certainly make you feel that way. They are definitely not trivially debugged at scale

link

vimda 1548 days ago

Would you rather it stay down while they spend a day debugging it?

link

paulryanrogers 1548 days ago

If that means it won't be down every morning in my time zone then yes.

link

seanw444 1548 days ago

As long as it's announced in advance so that users/customers can plan ahead, I don't see why not.

link

karmakaze 1548 days ago

They could use multiple writer hosts and rollover the restarts. MySQL has had GTIDs since 5.6 and replication groups rather than writer-replicas since some 5.7.x version.

link

iBotPeaches 1548 days ago

It seems like we haven't had a non-robot status update on the status page in days since this what seems like daily occurrence. I figure at this point we'd get something of why this is happening.

I also don't appreciate our builds freezing, unable to be cancelled and then eating up hundreds of minutes.

link

lucasyvas 1548 days ago

Billing should always be built on a "ping" IMO and not start/stop hooks. The latter is shockingly bad for customers during times of unreliability. The former sounds stupid and requires more infrastructure from the one offering the service, but I think it's more fair.

I haven't used GA in a way where it actually costed me anything, but having minutes just tick away while you can't do anything is really stupid if that's the case.

Edit: Another sane solution would probably be to record outage periods and have Billing automatically reconcile for every customer when invoicing. This would require them to admit the outage durations however, so it may be flawed from a human perspective.

link

drusepth 1548 days ago

The "ping" solution is an interesting one that I haven't seen proposed before.

At what rate would you do these pings? I don't know how upgrading/downgrading works at GitHub but if they do any sort of refund/credit when you downgrade, it seems like there's some interesting implications for abusing the system (e.g. upgrading/downgrading between pings for "free" service if the time between them is too long) versus performance (e.g. how do you update all users per ping in a timely manner if the time between them is too short?).

Would love to read up more on this approach; seems interesting!

link

MattIPv4 1548 days ago

> I figure at this point we'd get something of why this is happening.

I've created a new discussion in their feedback repo asking for this, three major outages in a week could really do with a post-mortem: https://github.com/github/feedback/discussions/13344

link

mhitza 1548 days ago

I suggest you add the timeout-minute property on the job/step, so even if the web interface isn't responsive the job times out eventually. Saves you from spending time emailing support about consumed minutes.

Of course, assuming that a future bug won't affect the timeout-minute itself.

link

easton 1548 days ago

Do they give you the minutes back if there's an incident during the period where a job is running?

link

no_wizard 1548 days ago

You will have to contact them for them to credit you, that's what we did

link

lucasyvas 1548 days ago

This is totally unsurprising and also totally unacceptable IMO. They should automatically wipe out all build minute usage during outages for every account if they insist on architecting their system in this way.

link

mfashby 1548 days ago

I'm inclined to look at tools like fossil again, for it's distributed issue tracking and wiki capability

https://fossil-scm.org/home/doc/trunk/www/index.wiki

link

JonChesterfield 1548 days ago

Fossil is faultless for a team size of one. I've been using it for nearly a decade, doing totally non-optimal things like using versions released years apart on different OSs with the same database. I also ctrl-c it when I spot a typo in a commit message and check in binaries. Never missed a beat.

As headcount goes up I think the inability to locally rewrite history into easily reviewable patches would be sorely missed. So it's git for team stuff and fossil for my own.

link

edgyquant 1548 days ago

I had forgotten about that, thanks!

link

koolba 1548 days ago

I really wish they would add the word “outage” to these titles.

“Incident” alone makes me think something got hacked or leaked.

link

arez 1548 days ago

That's SRE lingo --> https://sre.google/sre-book/managing-incidents/

link

zufallsheld 1548 days ago

It's also itil lingo, which predates sre.

link

mtnops 1548 days ago

It's NIMS - FEMA lingo, which predates ITIL. Which was developed in USFS wildland firefighting, which predates FEMA. It's incident management all the way down.

link

zacharynewton 1548 days ago

"The Simpsons already did it"

link

mirekrusin 1548 days ago

Status page says only degraded performance.

It's a nice way of putting it.

I'm trying to run github action for couple of hours now. They don't work at all. But apparently this means they run, but in infinite time, hence == degraded performance, nice.

link

raffraffraff 1548 days ago

It's just a way to avoid SLA breaches. "Of course it wasn't down! It was just infinitely slow!"

link

okareaman 1548 days ago

What's the difference between GitHub and GrubHub?

GrubHub delivers

link

darknavi 1548 days ago

Watching Lion King as a youth I always though grubs looked delicious.

Little did I know...

link

jadbox 1548 days ago

First HN comment that ever made me laugh, well done.

link

bob1029 1548 days ago

We are scheduling a call with an enterprise sales person next week.

If I can get all the Github features I had as of ~2020, but on an instance that wont get hit by the public cloud/update bus, I would be exceptionally happy.

The only complaints we have are regarding availability. If we can fix that one problem, this is a perfect product in our view.

link

andruby 1548 days ago

How do you evalute running your own gitlab instance?

link

jonnybarnes 1548 days ago

2nd day in a row isn’t it?

link

stepri 1548 days ago

And 6 days ago: https://news.ycombinator.com/item?id=30711269

link

momothereal 1548 days ago

Yes: https://news.ycombinator.com/item?id=30767635

link

rvz 1548 days ago

It is. 24 hours later [0] and I only expected it to happen once every month. Looks like it is getting worse.

Oh dear. Not a good idea to go 'all in' on GitHub.

[0] https://news.ycombinator.com/item?id=30767821

link

fishnchips 1548 days ago

Yesterday they had two.

link

Xarodon 1548 days ago

This has been a pretty rough week for GitHub

link

stuff4ben 1548 days ago

Github Enterprise hasn't been faring too well at my work either this week. When you work on both open and closed source products and GH and GHE are both down, it leads to a very unproductive week.

link

jrowley 1548 days ago

Does GitHub enterprise result in dedicated instance or any better availability?

link

jon-wood 1548 days ago

It Depends.

GitHub Enterprise is confusingly both a "call us for pricing" tier of GitHub the website, and also an on-premise version of GitHub that you can run as an appliance in your own data centre. The first of those is ultimately just GitHub and so has the same outages, the second is running on your own hardware so (shouldn't be) tied to the website's availability.

link

bewuethr 1548 days ago

There are multiple products: self-hosted (Enterprise Server) and hosted by GitHub (Enterprise Cloud). I don't know about uptime guarantees, but you can buy Premium or Premium Plus support with 30-minute SLA or a dedicated account manager.

link

cube2222 1548 days ago

Looks like they really want to get a PR deployed, but there's still not enough duct tape on it.

link

nimbius 1548 days ago

https://www.githubstatus.com/history

21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive.

link

edgyquant 1548 days ago

I’m not sure at what organization that is true. My company lives out of GitHub and Jira and I’ve hardly noticed the three month surge. GitHub would have to do a lot worse to get many companies to want to host their own services. This is the argument people have said about the cloud from day one.

People want to know it isn’t their problem, that makes cloud computing (and things like GitHub) worth their weight in gold. I have real problems to solve I don’t want to deal with a git repo manager on top of that.

link

thaeli 1548 days ago

Also, looking at this it seems like GitHub isn't doing the common SaaS thing of just lying on their status page. Many providers, both internal and external, would look a lot worse if they had honest status pages.

link

mirekrusin 1548 days ago

They are green for good 15 minutes from first moment i see problems, not the first time, it happens actually quite often. Maybe that's the time they need to confirm/cross check/write status update, don't know.

link

georgemcbay 1548 days ago

While quicker reporting would be better, 15 minutes is anecdotally a lot better than I see from most other services where their status pages will report all-clear hours into full outages.

link

thaeli 1548 days ago

Yeah, I'm legit impressed with a 15 minute time here.

link

judge2020 1548 days ago

They probably allow regular SREs to trigger an incident on the status page on their own, when the likes of AWS and other bigger cloud providers are rumored to need approval from a VP[0] to update the status page.

0: https://news.ycombinator.com/item?id=29475756

link

jhugo 1548 days ago

Several of the recent outages were much longer (at least for us, here in Asia) than they admitted on their status page. In one case I started work, noticed I couldn't push to or pull from GitHub, that situation persisted all day, and around 5pm local time (so morning-ish in the US) suddenly their status page acknowledged the problem and a discussion started on HN.

link

ishanjain28 1548 days ago

They do intentionally or not lie about this on their status page. From December 25th to December 31st 2021, Github actions had network problems almost every single day for hours and the status page was green out through out that period.

Same thing also happened few months back.

It feels like they do this manually and it's only done when enough people are effected.

link

wnevets 1548 days ago

> I’ve hardly noticed the three month surge.

This has been my experience as well. I don't know if that means GitHub is being overly transparent about issues or I've just been lucky but I would hate if people punished services for being transparent and informative on their status pages.

link

zenexer 1548 days ago

GitHub's outages have hit me hard over the past week or so. I don't think it's a matter of them being transparent--if anything, I was hitting errors well before their status page updated. Yesterday it was completely unusable for much of my workday, and today tasks that normally take me a few minutes have been taking hours.

link

SkyPuncher 1548 days ago

> I’m not sure at what organization that is true. My company lives out of GitHub and Jira and I’ve hardly noticed the three month surge

These have been minor inconveniences for us - at worst. Most of the time it simply means people jump to something else then come back later in the day.

Failing tests and PR feedback cycles are more of a blocker to our team than these outages.

link

gjulianm 1548 days ago

At my organization it's always been true. Setting up GitLab is fairly easy, in my company we do it and it's cheap (on-prem hosting is basically zero, and we had the IPs/domains already) and it hasn't given us too many headaches. I think last time I had to do something was maybe a few months ago when I restarted it so that it picked up the updated SSL certificate.

link

nightpool 1548 days ago

Self-hosted GitLab got a good callout yesterday from Microsoft, it appears to be a favorite of LAPSUS$: https://www.microsoft.com/security/blog/2022/03/22/dev-0537-...

Self-hosting always increases the operational burden of making sure your systems are secure. Maybe you have the engineering resources to spend on patching everything immediately and conducting in-house pen tests, but for most companies it's much, much more secure to let the software's developers host it as well.

link

temp8964 1548 days ago

Not necessarily. Self-hosted services are protected by company firewall / VPN. They can setup very restrictive network access. They don't have the same level of risks as public services like GitHub or GitLab.

link

hhh 1548 days ago

Establishing an entry point via VPN is Lapsus$ primary first step.

link

Melatonic 1548 days ago

Except that the software developers hosting is also a much, much bigger target and you generally do not have any real control over how often they are patching either.

link

skeeter2020 1548 days ago

>> Setting up GitLab is fairly easy, in my company we do it and it's cheap (on-prem hosting is basically zero, and we had the IPs/domains already)

In what tech company is hosting or domains the main cost centre? Many companies spend more on a single hour of a dev's time than their entire GH monthly bill.

link

capitol_ 1548 days ago

I think we pay about $10 per developer per month for github, and with about 1000 developers I would love that hourly rate.

link

kleebeesh 1548 days ago

...What? $10 x 1000 = $10k / month. $10k x 12 = $120k. That is a new grad software engineer salary in any US city. You'd pay more than that for a single dev with the devops and security experience to keep GHE running and patched for 1000 devs.

link

xondono 1548 days ago

I’d say it depends, I run my own on prem server and gitlab was a PITA. Too many moving parts, updating took too much of my time, and I never felt “safe”.

Moving to gitea solved all of those issues for me (thus far), now I’m looking into adding other stuff like CI through Drone.

link

Inversechi 1548 days ago

Did you consider woodpecker instead of drone? It's basically an evolved fork of the OSS version.

https://woodpecker-ci.org/

link

xondono 1548 days ago

Didn’t even know about it. I’ll check it out.

Thanks!

link

KronisLV 1548 days ago

Curiously, this was also my own experience!

I actually wrote a bit about the migration process, as well as the reasons for migrating over to Gitea, Nexus and Drone CI as opposed to using GitLab, GitLab Registry and GitLab CI: https://blog.kronis.dev/articles/goodbye-gitlab-hello-gitea-...

With containers, it's actually a pretty good experience that's not too hard to setup or manage.

link

edgyquant 1548 days ago

It definitely depends. We’re pretty early stage and I’m the senior engineer+infrastructure guy so running our own gitea instance or whatever is just more time that I’m almost out of.

link

jeltz 1548 days ago

Maybe you are in a different time zone because our organization certainly noticed and was disrupted by this.

link

edgyquant 1548 days ago

I’m on PST time, some of our other devs are on the east coast and one is in India. I think we’re spread out enough it should be an issue but maybe we prioritize different things.

link

jeltz 1548 days ago

We are in CET and maybe we use Github differently than you.

link

jhugo 1548 days ago

I think the impact was for some reason not consistent between users (maybe due to geographical factors or maybe sharding of accounts?). We're in Asia and I think we've had three different days recently where we couldn't actually get much work done due to GitHub being flakey or down for the entire day and our CI/CD and development processes being built around it. We ended up moving off GitHub onto a self-hosted system, which took about a day of work for one engineer (CI/CD itself was already self-hosted, so just Git, issues and PRs), and there have already been two more GitHub outages since then.

link

jmartens 1548 days ago

My company monitors the functionality, performance and availability of apps like Github, and we have certainly noticed the increase in issues lately.

link

edgyquant 1548 days ago

We were actually talking about implementing this last week. Not for GitHub but for slack as it seems to have issues once a month or so.

link

ManWith2Plans 1548 days ago

I will say that for us this is a huge deal. We're a devops services company, and our customers expect their deployment pipelines to work. This is becoming a huge pain-point for a few of our customers and we recommended Github Actions to them. A couple of our customers want us to move away from GitHub actions because of how disruptive outages have been.

link

jacobr 1548 days ago

20 PRs waiting in line for half a day to be merged is pretty annoying. We’ve had that on multiple occasions the last few weeks due to GitHub incidents.

link

mrkurt 1548 days ago

If you want companies to be honest on their status pages (I do!), you can't just count incidents like that. Status pages can be an amazing place to communicate all kinds of problems.

Most issues have a relatively narrow impact, but the impacted people _still_ benefit from seeing them listed.

link

jmartens 1548 days ago

How can we solve this as customers, or push the vendor to do better?

link

mrkurt 1548 days ago

Use vendors who do a good job communicating status, basically. I don't think you can change AWS behavior. But if you find a hosting company who does an amazing job with their status updates, put some apps there (_my_ company does an ok job with status page updates, we're getting better, it's not amazing yet).

link

encryptluks2 1548 days ago

Stop being a customer of crappy vendors

link

copperx 1548 days ago

What cloud provider does better status pages than AWS?

link

drusepth 1548 days ago

The snarky answer is "literally all of them", but one real answer is that I've been pretty happy with GCP's status reporting for the past year-ish I've used them. I've only noticed a few incidents, but every time I've checked the status it was already updated. They also occasionally provide workarounds on the live incident pages if you need to be back up before the issue is fixed on their end.

link

mst 1548 days ago

stop.lying.cloud is the accurate AWS status page.

link

gwbas1c 1548 days ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

When you host things yourself, you still have downtime. And, having worked with Github for over a decade, the actual disruption to my work is from downtime is much less than if I had to host my own.

That being said: I briefly worked for a company that hosted its own source code control system. For us, as a small team, it wasn't worth it. The system was outdated and hosted in an insecure manner. No one ever did any "admin" work except the founder. He ran it because he had irrational fears of switching, not because of any tangible advantages over Github (and competitors.)

Keep in mind that Github (and competitors) are often cheaper than the time needed to invest in hosting your own. (Estimate 10-20 hours a year of invested time. Calculate your hourly rate. Github and competitors are cheaper.) In order to come ahead, you need tangible benefits other than "I think I can have less downtime."

link

megous 1548 days ago

Dunno, I got blocked from my work SaaS hosted gitlab for about a month by cloudflare. Nobody at gitlab or cf helped. I only figured it myself after about 4 hours of research, that it was caused by some disabled (by me years ago) web tracking APIs no-one should have hard dependence on.

I certainly would not have this problem on self hosted instance, because it would not be behind CF. I'm sure I'd have other problems though. :)

All software is crap. You can be either spending time fixing it yourself, or spending time begging online for fixes/help from some SaaS company/community with resolution time in months, somtimes, all that while you may not be able to use it fully.

Also with SaaS it will be constantly shifting under you. Things will be moved around, restyled, iconized, popupized, etc. This doesn't help productivity either. With self-hosting, you can at least avoid upgrading, if you dislike this kind of thing. Or choose FOSS software that values UX permanency/stability, which seems to be really hard ask from SaaS business.

link

belter 1548 days ago

Excluding ones reported as [Errors], [Scheduled] or [Notifications]

2019 -> 39 Incidents

2020 -> 67 Incidents

2021 -> 86 Incidents

2022 -> 20 Incidents so far

Edit: Using Linear Regression...Prediction for total end 2022: 111 Incidents.

link

omoikane 1548 days ago

I wondered if those error rates were proportional to Github's growth over time, so I looked it up. It seems that they have 40M users in 2019[1] and 73M users in 2021[2], which translates to 0.975 incidents per million users per year in 2019 compared to 1.178 in 2021.

So perhaps they are not exactly improving, but maybe there is some other way to normalize the data.

[1] https://github.blog/2019-11-06-the-state-of-the-octoverse-20...

[2] https://octoverse.github.com/

link

mrkramer 1548 days ago

One would thought when they got acquired by Microsoft that the number of incidents would go down considering all resources Microsoft would provide but no.

link

speedgoose 1548 days ago

GitHub has a lot more features now though. A few years ago you didn’t have GitHub actions or workspaces, mostly a DDoS from Asia once in a while.

link

mbesto 1548 days ago

The number of incidents isn't so much of a problem as the amount of downtime is. That would be more interesting to see.

link

belter 1548 days ago

GitHub Availability Report [1]

Service Downtime Core Services Only - Cumulative per Month

( Some months with more than one outage)

Jan 2021: 3 hours 53 min

Feb 2021: 1 hour 42 min

Mar 2021: 4 hours 10 min

Apr 2021: 2 hours 20 min

May 2021: 10 hours 34 min

Jun 2021: 0 min

Jul 2021: 0 min

Aug 2021: 4 hours 23 min

Sep 2021: 0 min

Oct 2021: 1 hour 36 min

Nov 2021: 2 hours 50 min

Dec 2021: 0 min

Jan 2022: 26 min

Feb 2022: 13 min

[1] https://github.blog/tag/github-availability-report/

link

mbesto 1548 days ago

So, if my math is right (for 2021 only): 1888 min / 525,600 min = 99.64% uptime.

If it was more like 99.80+ I think I would be like "meh", but honestly for the price you pay that's not terrible. Still, for a company at the Microsoft level, it should be 99.80 at least.

link

JonChesterfield 1548 days ago

This is the same Microsoft that reboots laptops in the middle of teams calls to do hour long update cycles. >99% is implausibly good.

link

ejb999 1548 days ago

thats not the kind of progression you like to see - that is, error rates increasing over time instead of decreasing.

link

TheRealPomax 1548 days ago

Only if you believe those numbers mean anything. What are the errors for? Github has been adding lots of features and subproducts over the years, becoming a bigger and bigger platform as a result. What you want is the error-per-component, which may very well have actually gone down, with error spikes coming from "when github adds a completely new feature and it goes through a slew of incidents in its first year". The bigger the feature, the more incidents.

Without more detailed numbers, there's literally no conclusion to draw here.

link

ejb999 1548 days ago

Every place I have ever worked reported incidents going down would be good, not up.

link

TheRealPomax 1548 days ago

Every place I ever worked at understood that if you x3 the codebase/infra/interaction surface/etc, you can expect x3 errors. If the total number of errors don't go up as you grow you're doing amazing, and if they go down even though you're landing more and more code for more and more features and subproducts, you have a genuine miracle.

link

quercetumMons 1548 days ago

Reasonable if growth/load is growing, too.

link

antiquark 1548 days ago

Based on the same interpolation, github will reach one incident per day by 2032.

link

TheRealPomax 1548 days ago

But how many of those actually affected you? For example, no amount of issues around codespaces or github packages would impact my professional use of github, so whether there are 21 or 5000 or those parts get permanently taken offline makes no difference in what I need out of the platform.

How many core incidents? The part that affects whether you can even push to and pull from a repo, and access issues and PRs? Because everything else is nice to have, but you can do work perfectly fine without them if they go down for a few hours.

link

jeltz 1548 days ago

I was affected by the one last week, the one yesterday and the one today. The one today was harmless but the other two disrupted out work. All three were "core incidents", but the one today felt shorter.

link

CanSpice 1548 days ago

Yesterday's affected me, I couldn't pull or push and when I tried to look at the repo to do PRs I got 500 errors. That only lasted maybe 30 minutes though.

link

sitzkrieg 1548 days ago

i could not even sso login so it was a bit more impactful than it sounds on the paper

link

TheRealPomax 1542 days ago

Absolutely, but could you not log in every single time, or just "once this year so far"?

link

Jiejeing 1548 days ago

If you are a closed org, that is. Running your own gitea or gitlab with registration enabled and having to deal with spam is a real hurdle.

link

julianlam 1548 days ago

Is it not possible to restrict access to the git server from a VPN server only?

Just off the top of my head, that's one thing you can do.

link

mlyle 1548 days ago

Yah, that's a "closed org". When you need to deal with the public at large, you need to deal with user registration issues and spam.

link

Dobbs 1548 days ago

So now every person who wants to contribute to your open source project has to setup a VPN client?

The parent comment was explicitly about non-closed (e.g. private) orgs.

link

ironmagma 1548 days ago

We run Gitea at my company. In fact, we forked it. It could reeeaaaalllly use a rewrite. If anyone is even mildly ambitious about creating a new alternative to Github/Gitea, it's a great time to do that.

link

KronisLV 1548 days ago

Another self-hosted project in the space that i've seen was GitBucket, although it runs on the JVM (not necessarily a bad thing, just different from Go): https://gitbucket.github.io/

link

mynameismon 1548 days ago

You might be interested in sourcehut: https://sr.ht

link

devwastaken 1548 days ago

And whom pays for fixing it? Downtimes of self hosted systems using external software can be far longer. GitHub, unlike Amazon and friends, doesn't lie about their downtime. Every saas has hundreds of downtime instances across the board every month. Some are small enough you don't see them. Yet the services still work exceptionally well - and when they don't they get fixed in a quick manner. What takes them an hour would take most private orgs a day.

link

AlexandrB 1548 days ago

> GitHub, unlike Amazon and friends, doesn't lie about their downtime.

Are you kidding? The last 2 incidents were called "degraded performance". Where "degraded" meant I would get nothing but 500 errors accessing GitHub.com either via browser or git itself for the duration of the outage. How is this not lying?

link

everfrustrated 1548 days ago

GitHub is notorious for only noticing outages once the USA morning starts.

If you're using GitHub in Europe or Asia it's not uncommon for GitHub to be offline for many hours before they acknowledge anying.

link

rvz 1548 days ago

Well, I think I have said that since 2020 [0] and it is self-evident that you are better off self-hosting your own Git repo. If you can host a website you can do it. If GNOME, ReactOS, Wireguard, Linux Kernel Project, Mozilla, etc can do it, so can you. Or even use it as a backup / failsafe just in case.

But going 'all in' on GitHub just doesn't make any sense anymore.

[0] https://hn.algolia.com/?dateRange=all&page=1&prefix=true&que...

link

Someone 1548 days ago

But who can host a website? I would be wary of hosting something that isn’t a 100% static site, out of fear of the amount of attention maintenance would take.

Also, quite a few of the non-profits behind the projects you mentioned have multi-million dollar budgets that they can use to administer their git instance, if needed. I don’t think “if they can do it, you can” is a strong argument for those.

link

rvz 1548 days ago

I don't recall ReactOS, or the creators of wireguard having 'multi million dollar budgets'. How is it that even projects like RedoxOS [0] are able to self-host on a GitLab instance using a subdomain, without giant budgets in the millions?

You don't need a 'multi-million dollar budget' to self-host a git repo and may of these open-source projects have been doing so even before GitHub existed for years. Even if they did have such a budget, there isn't an excuse left to self-host and avoid going 'all in' on GitHub.

At the very least I would expect something like what ReactOS is doing by having a self-hosted backup just in case GitHub goes down or vice-versa. [1]

Looks like that is proving to be useful.

[0] https://gitlab.redox-os.org/redox-os

[1] https://github.com/reactos/reactos#code-mirrors

link

Someone 1548 days ago

> You don't need a 'multi-million dollar budget' to self-host a git repo

I never made that claim. The argument was “if X can do it, so can you”.

I pointed out that _some_of_these_ (Mozilla, likely the most extreme of them, had over $400 million in revenues in 2020), are quite different from the typical ‘you’, invalidating that argument.

As always, invalidating an argument doesn’t mean its conclusion is wrong.

link

rvz 1548 days ago

> The argument was “if X can do it, so can you”.

So when are you going to question this user [0] and others here planning to do the same thing for not having a 'multi-million dollar budget' for self-hosting their own services then?

Since clearly according to you they 'can't do it', despite me saying 'if X can do it so can someone else'. Where 'X' can be even a toy project like RedoxOS, or a messenger project like GNU Ring hosted by themselves and accessed via a subdomain.

Seems like they and other lesser known and funded open-source projects are doing just fine like that for years.

[0] https://news.ycombinator.com/item?id=30780874

link

rglullis 1548 days ago

My last bill from Hetzner was ~35€. I host gitea, drone CI, hashicorp vault and my own docker registry/pypi repository. I can add as many users as I want, and I had exactly zero incidents in the past ~6 years since I set this up.

I don't even worry about a strong backup strategy (besides just making occasional snapshots of the data volumes) because this was all set up with IaC tools (Terraform, Ansible) and I have copies of all the code in local repositories.

link

rglullis 1548 days ago

It's almost like people forget that git is a Distributed Version Control System, after all...

link

djbusby 1548 days ago

GitHub/Lab are for more than just code repo

link

aaaaaaaaata 1547 days ago

Can project management features not be made part of a dumb repo on the db side? (Spoiler: yes, and many projects have explored this — setup unfortunately has never been as easy as "we'll invite u to the gh, check ur email".

Perhaps with decentralization push of web3/QR etc, we'll get over the hump.

link

rglullis 1547 days ago

I think that parent means also things like CI, release repository, PR review, etc.

These are not easily portable, but honestly is because of this lock-in that I prefer to use separate/independent tools. For my open source project [0], I am putting things on github and it is the link that give to most people, but in reality is just a mirror to the gitlab repository[1], which I use for CI and static page hosting, and the "project management" is done on Taiga [2]

  [0]: https://github.com/mushroomlabs/hub20
  [1]: https://gitlab.com/mushroomlabs/hub20/hub20
  [2]: https://tree.taiga.io/project/lullis-mushroomlabshub20

link

mhh__ 1548 days ago

The company I work for has a bunch of non-programmers using and working in gitlab (or "the git"), I can't really see it happening with GitHub regardless of where it was hosted.

Gitlab just seems better for actually running a software project.

link

dbrgn 1548 days ago

Does Gitea support some kind of federation / cross-instance PRs? That's the main thing I'd miss from a self-hosted instance, the ease of getting contributions.

After all, you don't even need Gitea for pure Git hosting. If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

The reason I'm still using GitHub is not code hosting. It's collaboration.

link

dbrgn 1548 days ago

It seems there's a tracking issue here, but it seems stalled: https://github.com/go-gitea/gitea/issues/1612

link

tokumei 1548 days ago

> If you have a server with SSH access, just init a bare repo in a directory, push to that, and you're ready to go. No web UI needed.

Used to do that years ago for my personal projects. Honestly does the trick.

link

brimble 1548 days ago

Gitea gets you: a nice GitHub-like web GUI, including for stuff like managing users; 2FA; some integrations; web hooks without having to add git-hooks to all your repos; and extremely-useful-to-some-projects features like git-lfs support.

If you don't want or need those things, bare git repos are fine and certainly easier to support (not that Gitea's that hard, though a few issues/PRs I've noticed have caused me more than a little concern about the overall quality of the project).

link

encryptluks2 1548 days ago

But by using GitHub for "collaboration" you are sacrificing decentralization.

link

mst 1548 days ago

Absolutely, and I genuinely hate that.

But for new open source projects of mine the ease of contribution and user expectation of a github repository are a trade-off worth making even so (I also maintain a self hosted master git repo that I consider the source of truth to -me- but these days it syncs from, rather than to, github, just because of the trade-offs involved).

link

encryptluks2 1547 days ago

And in return you are sacrificing your code and contributors to a Microsoft-based privacy policy. GitLab is better but still not ideal.

link

Melatonic 1548 days ago

The marketing for building your own new "private cloud" will begin soon I am sure :-D

link

pid-1 1548 days ago

You just made a few Openstack consultants raise from their graves.

link

dijonman2 1548 days ago

GitHub enterprise is amazing, but I agree that a centrally hosted Git instance of any variety is a liability.

With the advent of the Okta breach I think we will see a reverse in the centralization trend.

link

ransom1538 1548 days ago

"21 incident outages in just 3 months. At this rate the benefits of running your own gitea or gitlab are starting to become competitive."

Oh stop the drama. Fine. Setup your gitlab.

link

adamsmith143 1548 days ago

Is it really though? Are engineers committing so frequently that they can't make it through a few hours without Github?

link

imiric 1548 days ago

GitHub doesn't just host Git repositories. It's the central location for discussions, issues, code reviews, milestone planning, and any CI process like testing or releases. If it's unavailable whole teams can be interrupted.

Git is distributed. GitHub is very much not.

link

lallysingh 1548 days ago

It depends on how many engineers you have! But also, there are plenty of other functions in GH besides raw git, like Wiki/PR/Issues/test/deploy pipelines, etc. It can become pretty critical.

link

jeffwask 1548 days ago

Yes

link

queuebert 1548 days ago

Maybe they measure performance in git commits.

link

duped 1548 days ago

An outage of a few hours can tank a release deadline for me, so yes.

link

Chris2048 1548 days ago

> running your own

assuming that would be flawless, which it wouldn't

link

sonicggg 1548 days ago

Come on, don't be so dramatic. This is not a 911 call center, people will survive these minor outages.

link

mdellavo 1548 days ago

sure but three days in a row?

link

chockchocschoir 1548 days ago

> At this rate the benefits of running your own gitea or gitlab are starting to become competitive

No need, just use Codeberg.org instead. They run Gitea and is a free collaboration platform (+ git hosting) for free projects. FOSS/OSS should really consider alternatives to GitHub and GitLab, especially when there are much more FOSS/OSS friendly platforms around.

link

higeorge13 1548 days ago

The usual services (actions) again down around the same time. This is embarrassing.

link

intunderflow 1548 days ago

With how often these happen we might as well sticky this thread for the next one

link

etimberg 1548 days ago

The quality of GH seems to be slipping

link

Trasmatta 1548 days ago

I've actually been pretty impressed with the quality of the product and new features over the past couple of years, but it seems to be having a lot of stability issues recently.

link

etimberg 1548 days ago

I've liked the new features too, especially after so many years of not many features. Maybe they've moved too fast now

link

xtracto 1548 days ago

Funny that it happened since they were acquired by Microsoft... reminds me of Hotmail, Skype, LinkedIn, Rare, among several others.

link

amelius 1548 days ago

I hope it doesn't affect security ...

link

rvz 1548 days ago

Again? Last time that happened was 24 hours ago? [0] It is really getting unreliably bad. Like I said before, having a self-hosted backup seems to make more sense.

[0] https://news.ycombinator.com/item?id=30767821

link

einpoklum 1548 days ago

The page at the link is not much more informative than the link itself :-(

link

grumple 1548 days ago

Again?! Jeez. I wish I had customers this tolerant.

link

max23_ 1548 days ago

Looks like the same services that were affected in yesterday incident.

link

mirekrusin 1548 days ago

What's the best crowdsourced status monitor?

link

eckza 1548 days ago

https://outage.bingo/

link

mirekrusin 1545 days ago

+1 :)

link

eatonphil 1548 days ago

Github Actions are back for me now.

link

frjalex 1548 days ago

Looking at the "GitHub" prefix in the title, I was half-expecting this to point to a report explaining the outage a week ago... But rest assured, it is a new outage!

link

teekert 1548 days ago

Oh I thought it was about the one from yesterday :)

link

aaaaaaaaata 1548 days ago

Are their CI/CD toys that shiny that people still willingly choose them even with all the issues?

I find myself regularly asking this — about every major SaaS used for critical ops stuff like this.

link

teekert 1548 days ago

Work choose GitHub (we are a MicroSoft shop), I have to say, I like GitHub a lot. The disruptions have been annoying sometimes, that's true. But due to the nature of Git I could always just keep working.

link

annexrichmond 1548 days ago

I thought it was going to be a Postmortem. I couldn't have been more wrong!

link

toastal 1548 days ago

And to think Git can easily be decentralized. I wonder if the community could fork GitHub to fix it. Oh, it's not open source. Devs must be too busy working on more 'social' features like "For You (Beta)" to milk the attention economy.

link