GitHub Actions Down

Y	Hacker News new \| ask \| show \| jobs

	GitHub Actions Down
	89 points by kakamiokatsu 1862 days ago
	And I'm here waiting for my scheduled build to start...

9 comments

rvz 1862 days ago

So now I am expecting a major GitHub Actions incident (worst case, the whole of GitHub) to go down every single month at least once. Last time this went down was last month. [0]

I now doubt if they can consistently manage more than a month without a major incident like this one.

[0] https://news.ycombinator.com/item?id=26666843

link

purerandomness 1862 days ago

Isn't this normal with hosted solutions?

IIRC, GitHub, Bitbucket and GitLab.com are all unusable for a few hours at least once a month as far back as I can remember.

Isn't this just commonly accepted as a tradeoff for not having to manage the servers yourself?

link

danielheath 1861 days ago

I would say it's normal for businesses which are engaged in competitive feature development.

Stability & reliability are relatively easy to achieve if you aren't changing the software frequently.

link

Sebb767 1861 days ago

I'm pretty sure they're constantly working on their software (at least GitLab is), so "aren't changing anything" does probably not apply.

link

danielheath 1861 days ago

That was my point. If you want stability, look for someone who is no longer funding a substantial development team to work on the software.

link

talolard 1861 days ago

This is a profound insight.

link

swebs 1861 days ago

Nope, this isn't normal at all for products on AWS, GCP, or pretty much any other cloud provider. Azure is simply a subpar product and its time to stop attempting to sweep its awful downtime under the rug.

link

david_draco 1862 days ago

It could be interesting to track the uptime of such cloud services. Two decades ago, companies prided themselves with 4 or 5 nines (99.99% uptime). Not anymore. Worse is better won yet again ;-)

Companies and programmers should be aware what they get into when they build such dependencies. Distributed git is a thing, but distributed CI/CD that you could also run locally isn't (yet?).

link

Aeolun 1861 days ago

The fact that it’s down is not the problem, but it’s painful that there’s nothing I can do to fix it.

That’s the part I like about self hosting. It may ultimately be down more often, but I never have to tell someone “Nothing to be done, we wait.”.

link

Sebb767 1861 days ago

Which is better for you, but for your company, "wait, I'm on it" and "wait, they're on it" does not make a world of difference.

What's better, on the other hand, is that you can schedule expected and possible downtimes to a time that causes the least impact to your company; with a SaaS, an update might cause you problems any time.

link

blacktriangle 1861 days ago

You're second point exactly nails it. Hosted solutions break because they're busying pushing new features I may or may not care about. When self hosting I can decide when upgrading is worthwhile to my needs and then plan when to make risky actions according to my own organizations time line. Any single org can probably get away with 90% uptime just so long as the downtime is at the correct time.

link

Aeolun 1861 days ago

While I sort of agree, I guess that’s that’s arguable. My bosses really like telling theirs that we are doing something about it too.

In these kinds of situations, you often end up in a situation where they say ‘the problem is resolving itself in region x’, where region x is not relevant to you at all. If you are fixing your own setup you can focus on exactly what is most important (to you) first.

link

rurban 1862 days ago

Easy to explain. They switched to the MS Azure cloud for actions.

You won't get high availability from Microsoft you are used to as from proper cloud services. Plus privacy issues. But it's cheap, in this case for free.

link

Hamuko 1862 days ago

I was amused by reports when Microsoft was in talks with Discord that one of the reasons why Microsoft wanted to buy Discord was because they wanted Discord on Azure. Like, was the grand customer acquisition strategy for Azure just acquiring the companies and then migrating over?

link

isbvhodnvemrwvn 1862 days ago

My wife works as an AWS/Azure consultant, and she mentions that in our area it's much more common for the non-technical management to push Azure than it is for technology to choose it. Sounds quite IBM-ish/Oracle-ish.

link

_odey 1861 days ago

When you open a Microsoft account for a new company, they do a lookup to figure out your area of activity and if you're a good match you get contacted by a sales representative asking you if you want to become an Azure re-seller. Basically for services you sell to third parties you get Azure credits meaning your Azure usage is "free", and your clients pay the premium. I know this because I used to work for a company that did this, and from personal experience when starting a company.

Edit: Here it a tip; if you see a "Microsoft partner <TIER> Cloud Platform" badge on a outsourcers website stay away.

link

rjzzleep 1861 days ago

I've had a couple of Fortune 500 companies as clients. Microsoft/Azure is usually brought it as a place that is treated like VMs in the cloud. The setup and management is often handled by Accenture and Infosys. Impossible that that decision was made by Engineering. In fact those Accenture managed setups are almost unusable for engineers. I can't even begin to fathom how much these companies spend on Accenture to setup Azure in a fashion that you can't do anything.

The worst part about Azure for me is always the list of undocumented bugs you run into. On the surface it looks like everything started as an AWS equivalent, but when you have to drill down on something it almost always has some weird issues that you then find as unresolved complaints on some MS managed github issue list.

But hey, maybe I was just luckier with the other cloud providers.

link

robbyt 1861 days ago

I worked on a project automating some parts of an Azure infrastructure for a big company. Half-way through development, JSON integers returned by Azure changed from strings to ints, back to strings. E.g., "42" became 42, then a few weeks later went back to "42".

This and other API weirdness gave me such Azure PTSD that I promised myself I would never touch it again.

link

llama052 1861 days ago

That's exactly the situation I'm in at my current work environment now. All Azure, and everyone in Engineering/Devops hates it. It's a business decision though.

link

cjohansson 1861 days ago

Sounds like MS, that is the only way they can get organic customers. All their popular products are originally built by someone else except Windows of course

link

saurik 1861 days ago

Buying products though is very different than buying customers; if we try to map this hilarious customer "acquisition"--which is now a double entendre ;P--strategy to a more typical product, it would be akin to saying "no one is using Excel, so let's start buying large accounting firms currently using VisiCalc to migrate over".

link

NicoJuicy 1861 days ago

Azure, .net, office, office 365, teams, ml.net, ...

Not a single one i could manage building ( eg. Teams has bots, quick to create apps and pretty advanced cam features)

link

chrisweekly 1861 days ago

Teams UX is a hot mess; it's just astonishingly bad.

link

ithkuil 1862 days ago

Hopefully this serves as a good dogfooding excecise for MS now and help them improve things.

Every time I tried azure I was disappointed. But that doesn't mean they can't fix it; I bet there are now tons of talented engineers working there. My best wishes for them to up the quality of azure. I think diversity / alternatives are a good thing.

link

twistedpair 1859 days ago

Down ~4~ 9 hours so far today.

link

zomglings 1861 days ago

In the last week, there have been at least two ~1 hour periods where actions were stuck in the queue. Even posted about this on the GitHub community, but no response. [0]

As unreliable as GitHub Actions are, their convenience factor and price are right.

We take a very simple measure so we don't get fucked by these kinds of incidents: we don't use any actions from the marketplace.

All our GitHub Actions workflows are bash scripts that we wrote (and which often live in our repos at `deploy/deploy.bash`). The secrets necessary to run these scripts are available to the infrastructure team on 1Password.

This makes it easy for us to deploy manually and retroactively reflect that release on GitHub (e.g. through a tag or a release).

[0] https://github.community/t/github-action-stuck-on-starting-w...

link

xiwenc 1861 days ago

This is an interesting approach. I also dislike the current architecture they are pushing. Actions can break any time. Makes pipelines/actions very fragile and as you pointed not portable either.

The portability could be fixed by having a local cli runner that understand action yamls. Would be interesting to explore this.

For a while i have been fantasizing about an universal pipeline language. Like having LLVM with a unified model that can translate into different vendor implementations.

link

tailspin2019 1861 days ago

> The portability could be fixed by having a local cli runner that understand action yamls. Would be interesting to explore this.

This is a great idea. I guess the challenge would be keeping up with the more advanced aspects of Actions, eg. spinning up multiple VMs during the build process and other things that have a heavy infrastructure (or platform-specific) element.

I'm currently caught by the GitHub Actions downtime, but like the GP, all my build scripts generally make very light use of the platform-specific features and generally I keep most of the build logic in a build.sh file.

So I can build production builds locally if I need to - but a CLI that lets you "properly" run the GitHub yml files locally would be very interesting.

link

andreareina 1862 days ago

Incident url: https://www.githubstatus.com/incidents/zbpwygxwb3gw

link

limoce 1862 days ago

My workflows have been queued for at least four hours. It's a hotfix commit. GitHub Actions helped me a lot in synchronizing updates with other repositories automatically, but this time I have to manually apply it. Because I'm the first time expecting GitHub Actions incident, I was wondering that I ran out of my free quota for this month...

By the way, should GitHub send an email to the owner if any workflow has been delayed for an unreasonable time?

link

ddlsmurf 1862 days ago

Personally I'd rather they focus on the issue rather than negotiating with PR and legal to formulate an e-mail. If you rely on an external service and don't monitor it that's on you.

link

waheoo 1861 days ago

What a trite response.

The question was do they do a email if your job is delayed or late for whatever reason?

Not, hey why don't they email us all right now about the issue.

And no, it's not on me to monitor every little thing I rely on. Do you monitor kernel updates? I bet you don't. Besides that monitoring and logging for any provided service is exactly how one is supposed to monitor said external services so asking about monitoring options and being told, look buddy it's your job to monitor for this is just fucking rude.

link

ddlsmurf 1860 days ago

Well, I do, but yes no one can monitor everything. The question was should they have sent an e-mail and I shared my preference whereas to their prioritization of resources. And yes, if something I wasn't monitoring breaks, I still assume responsibility, most especially if it affects production. And no it wasn't rude, you seem very sensitive.

link

dap 1861 days ago

Have you tried monitoring GitHub Actions? It’s not uncommon for me to find that actions just don’t run for some reason. The docs are so incomplete that it’s hard for me to know why.

link

ddlsmurf 1861 days ago

I can't say github actions but azure devops yes, I have an http endpoint I want stuff to hit with outcome and if it's not I get bugged. Anything is going to break, for external stuff this is the only way to estimate the cost/benefit of a contingency.

link

sebmellen 1862 days ago

We build an Electron app for MacOS regularly. We run these builds on Github’s MacOS VMs.

The internet fails to connect when running yarn install about 3 out of every 5 times.

We’ve gotten multiple refunds for this issue, but it’s still a complete mess. Unfortunately we’ve built much of our process around GH Actions... if not I’m sure we would be on CircleCI or TravisCI already. We’ve also considered switching to self-hosted runners.

link

pilif 1861 days ago

We’re running everything on self-hosted runners and it works like a charm. They are also way, way faster than what GitHub can provide.

We get the advantages of the huge GitHub Actions ecosystem while having impressive (and fully controllable) performance and very easy access to out infrastructure for deployments.

link

giorgioz 1861 days ago

What do you use to run the self-hosted runners?

link

iBotPeaches 1861 days ago

In our case, we use a mac mini solely for building mobile applications via fastlane with a self-runner. Doing that with the GitHub runner on Mac would be mega expensive. The rest of the CI/test process runs on GitHub runners.

link

easton 1861 days ago

Does that work with M1/Apple Silicon Macs or Intel-only?

link

pilif 1860 days ago

GitHub right now only has their self-hosted Mac runner released as an Intel Build, but it works fine through Rosetta, though running Apple Silicon Xcode might require some additional wrapping of commands with `arch` which might or might not be built into the pre-built action you're using.

We're currently using an Intel Mac we've used from before we migrated from Jenkins to GitHub actions, so your mileage may vary.

link

desert_boi 1861 days ago

We're using https://next.yarnpkg.com/features/zero-installs, and it gets rid of one off issues like this (unless you have native deps which require an install).

link

sebmellen 1861 days ago

Thank you for the link, looks interesting! I wish this were available in Yarn V1 :/

link

twistedpair 1859 days ago

FWIW, GitHub Actions macs are MacStadium macs.

link

bob1029 1861 days ago

We only use GH Actions for check builds right now, which are easily disabled as a temporary measure.

Having the ability to build & deploy your software outside the confines of a cloud vendor is essential to survival. When the automation works, its great. When it doesn't, have a manual process that you can follow on a local workstation.

At the end of the day, you can always email the customer a zip file and walk them through installing the update in production. That is, as long as you didn't make your architecture and CI/CD one in the same thing, in which case you probably need to hit the reset button and try again.

link

vendiddy 1861 days ago

Great advice. How is this managed where you are? Do you share the same code between CI and the local workstation?

link

bob1029 1861 days ago

Everything required to build our application lives in a single code base & Visual Studio solution. The application is capable of building itself from source.

If you know how to do things like Process.Start, there's really no excuse for not being able to automate your build processes using code. MSBuild has a pretty damn simple set of CLI args if you are just doing modern .NET 3.x/5.x apps.

  git clone <my repo path>
  cd <my repo path>
  dotnet build --configuration Release
  //copy build artifacts to where ever they need to go

That's about it for us.

We use SQLite, so there aren't any dependencies outside of any particular checkout of the repo.

link

19h 1861 days ago

Tip: you can run and host your own GitHub Actions runners.

https://docs.github.com/en/actions/hosting-your-own-runners/...

link

OzzyB 1861 days ago

Since I have a Job that runs every hour, I can see that it's been down for ~4hours.

link

afrcnc 1861 days ago

I wonder what it could be: https://therecord.media/github-investigating-crypto-mining-c...

link