Hacker News new | ask | show | jobs
by warrentr 2917 days ago
This is very concerning but can happen on AWS as well. July 4th last year at about 4PM PST amazon silently shutdown our primary load balancer (ALB) due to some copyright complaint. This took out our main api and several dependent apps. We were able to get a tech support agent on the phone but he wasn't able to determine why this happened for several hours. Eventually we figured out that another department within amazon was responsible for pulling down the alb in an undetectable way. Ironically we are now in the process of moving from aws -> gcp.
5 comments

My coworker is running a hosted affiliate tracking system on AWS as part of our company. He regularly has to deal with AWS wanting to pull our servers because of email spam -- not because we're sending spam emails, but because some affiliate link is in a spam email that resolves to our server, and Spamhaus complained to AWS.

Usually this can get handled after a few days of aggravating emails back and forth, we get our client to ban the affiliate in question, and move on with our days with no downtime. But a few weeks ago my coworker came in to find our server taken offline, because AWS emailed him about a spam complaint on a Friday night, and they hadn't gotten a response by Sunday. It'd been down for hours before he realized.

They'd just null terminated the IP of the server, so he updated IPs in DNS real quick, but he then spent half a day both resolving the complaint, and then getting someone at AWS to say it wouldn't happen again. They supposedly put a flag on his account requiring upper management approval to disable something again, but we'll see if that works when it comes up again.

You're going to have to go multi-cloud if you truly want to insulate yourselves from this sort of problem.

If and when you do, give serious consideration to how you handle DNS.

Fwiw, Ansible makes the multicloud thing pretty straightforward as long as you aren’t married to services that only work for a specific cloud provider.

For that, you should consider setting up multiple accounts to isolate those services from the portable ones.

Wouldn't that be Terraform (perfect for setting up cloud infrastructure) vs. Ansible (can do all, but more geared to provisioning servers you already have)?
Ansible uses Apache Libcloud to run just about anything you need on any cloud provider in terms of provisioning. Once provisioned, it will handle all of your various configuration and deployment on those.

Also plays really nicely with Terraform.

How does ansible make it straighftorward? As far as I know, it neither helps with networking failover, load balancing, data consistency, or other aspects of distributed systems, and running one application across clouds is certainly a distributed systems problem, not a deployment problem.

Ansible helps deploy software, but deploying software is the smallest problem of going multi-cloud.

See reply to other comment.
I know what ansible is and can do. Your other comment is about how it can provision and deploy things. While true, it's unrelated to my point that that's the least of your problems in a multi-cloud world.
A lot of that depends on scale too. I was mostly talking about the ability to standardize configuration so that you could replicate your infrastructure on multiple providers. Essentially just making sure that you have a backup plan/redundancy in case something happens and you find yourself needing to spin things up elsewhere on short notice.

You're absolutely right that running them at the same time, data syncing, traffic flow, etc is much more complicated.

Also check out Mist.io. It's an open source multi-cloud management platform that abstracts the infrastructure layer to help you avoid the lock-in.

Disclosure: I'm one of the founders.

What's the difference between mist.io and Apache's libcloud?
Ansible is great a doing the things that Ansible does!
What are popular multi cloud solutions if you use AWS or GCP services that have proprietary APIs? Are there frameworks that paper over the API differences?
mist.io supports most public and private cloud platforms. Also, it's open source https://github.com/mistio/mist-ce
What's the difference between mist.io and Apache's libcloud?
Apache libcloud is a Python library that's used primarily to create, reboot & destroy machines in any supported cloud.

Mist.io is a cloud management platform that uses Apache libcloud under the hood. It provides a REST API & a Web UI that can be used for creating, rebooting & destroying machines, but also for tagging, monitoring, alerting, running scripts, orchestrating complex deployments, visualizing spending, configuring access policies, auditing & more.

What's the DNS solution here? Something like Cloudflare or Edge?
The correct DNS solution is to use multiple providers.

See: Route53 and Dyn outages in the past couple years.

They shutdown just the load balancer?

Forgive my ignorance but that seems like a weird choice rather than cutting access to the servers or in some more formal ways for copyright...

Also kinda concernit that multiple departments can take enforcement type action and others not know it. That seems way disorganized / recipe for diasater.

Whomever reported the violation probably identified the public IP address of the ALB and notified Amazon
Makes sense but you would think someone at AWS would handle it... more systematically.
> Ironically we are now in the process of moving from aws -> gcp.

Why not Azure? They have a solid platform and (at least for a MSFT partner) their support is top-notch.

I respectfully disagree. I have worked on two projects with Azure both with big accounts, one even so big that we had senior Azure people sitting in our teams. Both had the highest possible support contract.

Yet their support didn't ever solve a problem within their SLA's and sometimes critical level tickets were hanging for months.

Plus my impression is that whereas AWS (and possibly Google) clouds are built by engineers using best practices and logic, Azure products felt always very much marketing driven e.g. marketing gave engineering a list of features to launch and engineering did the minimum effort possible to have the corresponding box ticked. I absolutely hated working on Azure and now won't accept any contract on it.

Documentation is horrible or non-existing, things just don't work, have weird limitations or transient deployment errors, super weird architectural and implementation choices + you never escape the clunkyness of the MS legacy with for example AD.

We did have the same issues back in beta and we're forced to build choas monkey degrees of robustness into our platform. Was this experience of yours a while back? However, there are now a few people at work who even run VMs on it as their daily driver.
> Yet their support didn't ever solve a problem within their SLA's

What does this mean?

Service Level Agreements dictate the quality, availability, and responsibilities with the client. They put bounds on how long things will take to get answered, and sometimes fixed.

GP is saying that even though they had a contract to resolve issues within X hours/days the issues were not being solved within X hours/days.

Cynically: most SLAs with the 'Big Boys' tend to give guarantees about getting an answer, not a solution. "We are looking into the problem" may satisfy the terms of a contract, but they don't satisfy engineers in trouble.

I know what a SLA means but I have never seen an SLA from Azure dictating a guaranteed response time. They only give the SLA for time until initial reply as far as I know. I was suspecting the person I replied to have misunderstood what it is they have purchased. Maybe in some cases if you pay them some obscene amount of money you can purchase an SLA for for time til resolution but I don't think that's the case here.
Can't you end-to-end encrypt your data, so that Amazon can't run their copyright filters over them?
In this case the complaint was against some image(s) we were publicly hosting. We've taken steps to isolate our file hosting from the rest of the system in case this were to happen again. We only host images for fashion blog posts written by staff so I imagine other aws customers have had a much worse time in this regard.
So the copyright claim was legitimate?