Maybe it's because I just personally identify with the founders of github (i.e. entrepreneurial sw engineers), but I'm starting to get mad at whoever keeps doing this. Here's hoping that with all the smart people this is affecting, the people responsible will be tracked down and exposed.
May they be doomed to get a call like I did some 20 years ago from a large software company I'd consulted for:
"Mike, remember that project you did for us last year? Yeah, we've been shipping it with our product for a while - working great, thanks! Say... I don't suppose you might happen to still have a copy of the source code anywhere? I know, you probably deleted it after the project was over, but you were always so good about making backups - do you suppose you could rummage through your old backup tapes and see if there's anything? No, it's nothing like that! Well, we seem to have lost all our copies of the source and really hope you could help us out."
From what I've seen it's usually a rational business or policy decision. And your adversaries are in, or acting through, states with a weak rule of law. "Exposing" these actors is impossible/useless until you address that.
Easier said than done. We've had DDOS issues in the past as well, and getting it resolved - even by throwing money at the problem - is nontrivial.
What amounts to throwing a massive amount of hardware at the problem (i.e., boxes that can handle 10-100+gbps of traffic, filter out the attacks, and pass only legit stuff down to your servers) is expensive[1], and casuses all sorts of unexpected behavior: API clients mysteriously break, good traffic gets mistakenly dropped, latency is added to the whole process, etc. It gets even weirder on SSL-protected sites. And it's all dependent on attackers not getting the IP of your actual servers which they could then just attack directly.
[1] For sites with even not a whole lot of traffic, you're talking a one-year contract easily in the range of an engineer's salary. I wouldn't be surprised if the cost to protect sites with as much traffic as Github exceeded $1m/year. Even if you have plenty of cash in the bank, that's one hell of a pill to swallow.
Github can easily afford to use someone like Prolexic. And they should.
When you say things like "And it's all dependent on attackers not getting the IP of your actual servers" this makes me wonder how much you understand the subject matter. There are many, many options.
Prolexic's servers don't take the load if the attackers know where the computers behind the scrubbers are. Configuring iptables to ignore all traffic not coming from prolexic's IPs doesn't come close to fending off a DDOS.
I know this because I was told this by prolexic while configuring our servers to sit behind their scrubbing servers while we're under an equally crippling DDOS (one that took down half the customers in our datacenter, not just us). So while I haven't examined their tech stack under a magnifying glass, I'm not exactly talking out of my ass here.
Yes, there are other options but those don't take an hour to implement like signing a contract and changing a few DNS entries does. And when these conditions exist, you need an answer that can be implemented in an hour.
You are fabricating straw men. They do not need "an answer that can be implemented in an hour." They have been in business for 4 years, and this particular string of DDoS attacks has been going on for several days now. This is both a a planning failure and an incident response failure.
Your comment about iptables is odd. I don't know why iptables would be relevant here; I suspect we are talking about implementations several orders of magnitude different in size. Certainly one would drop traffic at the edges and not do filtering on end nodes.
Speaking from experience, most companies don't think to implement DDOS protection until they're under attack. It's just not on most people's checklists. Hence the need to implement something in an hour. The fact that its a problem proves my point.
Yes, it sounds like our scales here are quite different. I'm referring to a few machines in a single data center, not hundreds being geographically distributed.
Devil's avocado here: let's say you pay $100/mo for a gym membership and they shut down three days in a row because somebody called in a threat. How upset would you be at the gym?
A malicious attack by a third party is different from, say, the gym allowing black mold to grow in the locker room. I'd quit a gym if they had black mold. That's mismanagement. I wouldn't quit a gym if malicious third party intervention inconvenienced me.
Besides, GitHub is obviously more concerned about this than you or I could ever be. And having money doesn't make infrastructure magically appear.
I pay GitHub too. My company relies on it. I, too, was slightly inconvenienced this week. I was also slightly inconvenienced when I had to make a u-turn because the Battery Tunnel southbound on-ramp was closed. So what?
What makes DDOSes different from black mold? Both are expected risks and should be mitigated. Yeah, there are sentient actors behind the DDOS, but GitHub has to deal with it at the level of their infrastructure either way.
The fact that there are sentient actors behind the DDoS _is_ the difference.
You can reliably predict and protect against things like network outages, server failures, full datacenter failures (black mold)--you can directly measure their impact and plan failover paths. A DB server goes out? Whatever! That's why you have a hot backup or two online and ready to go.
What you can't predict is exactly how far a malicious third party will go to hurt you. You can't predict how many dollars they'll spend on their botnet minutes. You don't know if they're going to attack your infrastructure or the DNS. Can buying more bandwidth fix the problem? If so, how much more? And will the attacker simply up the ante when they see that you're recovering? Can filtering requests fix the problem? If so, will the attacker provision different resources to attack you with?
This isn't simply a matter of infrastructure, buying the right equipment, or setting things up "just right" precisely because there is a sentient actor trying to hurt you. It's more like a game of chess.
If high availability git is so critical then why not run your own git servers instead of or in addition to github?
I find it slightly ironic that the entire point of git is that it is distributed version control but 90% of git use seems to be focused around a product from a single company.
The wikis are git repos as well. Sync them periodically and you're good. Plus it's useful as a backup, because I don't think github archives your reflog.
Issues are normally mirrored to e-mails (caveat: you don't get mail for your own comments), so you can mostly pick up existing threads if your e-mail address book can find the github users involved. If they didn't obscure recipients (at least within an organisation — because I don't think address-book lock-in is worth inconveniencing paying clients), and made an auto self-bcc of your activity, issues would be entirely disaster resistant.
If you have GitHub enterprise, it is run on your stuff and you get your own Wikis, issues, unlimited repos, etc. Even your own Gist that you can wrap behind your own CAS.
I'm sure that's what they are doing now. However it takes time to setup new servers as well as writing code (that's been throughly audited) to help protect their existing and future servers.
>Perhaps I'm misunderstanding: I thought one goal of DVCS was to remove central points of failure? In that sense, isn't a central "hub" regressive?
This meme is getting really, really tiresome. Github being down is NOT a central point of failure. Most people know that setting up your own git server is trivial, literally a 3-4 step process. We know that we don't lose our files, our history, our working tree, etc.
The "git" in Github is easily replaced. The "hub" part has its own value. The communication tools, the well-presented diffs, the inline-editing capability, issues, wiki, etc. That's the value people are gnashing their teeth over.
Git is distributed and there's no reason you should have to stop working, or committing, just because github is temporarily unavailable. At least for dependencies only on git. Losing access to wikis, pull requests, and issues may be a problem for some teams.
Back in 2009 when it happened to bitbucket, this was afaik due to hosting a particular project (hurting bitbucket was a side effect of hurting this particular project, some communities seems to be happy to resolve issues with DDoS attacks...).
It could just be some rogue deployment script running from EC2 that are a little more active that it should be. Imagine someone is deploying their 1GB repo from GitHub to 100 small EC2 instances :)
My startup cucumbertown.com is hit with similar issues.
Initially we blocked all Ec2[1] & spamhaus ip list. But then realized Flipboard proxies[2], some blog aggregation proxies etc are based on Ec2 machines.
What would be a good way to block such rogue machines? Is there a community sponsored list or Ec2/Rackspace ips that are creating issues?
Banks were being hit the first week of October, then I know some VoIP servers were being hit such as Callcentric by DDoS. I can see why the banks were hit, but not why so many much smaller businesses are being attacked.
I don't think so. If you we're hosting GitHub you would figure out pretty easily if it was related to cloning a specific repo from AWS and just disable the account hosting the repo.
Care to elaborate why GitHub would be a target for a state sponsored DDOS attack? Seems a little far-fetched, for a website that is virtually unknown outside of the developer community
In the broadest sense, github is a site where anyone can upload and publicize any file of reasonable size. Depending on who is uploading what, that could easily make them a target.
There's some statistic out there somewhere from some paper which found out that like 3 out of every 4 (or something ridiculous like that) cyber attack on the US government comes from China so...it's not that farfetched.
LOIC is pretty easy to filter, it's about a 1 out of 10 on the difficulty scale. Either GitHub as a whole is technically incompetent, or they are getting hit with something built by big kids.