Hacker News new | ask | show | jobs
by ksdsh 4992 days ago
My site is affected. I don't know how to handle this situation because the issues affect whole region.
1 comments

You should have duplicate infrastructure in another region which you fail over to automatically or manually.
We are going to set this up for our app, however there's a limit to how useful it is when you are reliant on services such as MongoHQ/PubNub (in our case) that are also affected!
Well, it's not going to be cool or best practice to use cloud services for your startup for the next week or two. Fortunately people on HN will then forget until the next AWS outage and your setup will be fine again.
Duplicating across regions is not well supported by AWS unfortunately. Any idea doing that without completely rebuild the infrastructure in an other region?
I assume you are using the AWS dashboard to manually deploy instances and setting them up by hand? First step is to get your infrastucture to the point where it deploys and scales up and down by itself. Once you manage that, moving to or keeping hot spares in another region is pretty easy.

Look into automated deployment tools like Foreman and configuration management tools like Puppet.

You assumed right.

We are learning the hard way that the EC2 Availability Zone separation is not enough, and EC2 is lacking some key features offering multi region tools.

Thanks for the hints, I'm going to check them out.

I echo this... although I don't personally use Foreman. You can do scaling type stuff without the tool and I just use POP (plain old puppet). For me personally it took longer to get on the config-management bandwagon but once I did I have never looked back.

Learning to use Puppet, Chef, Salt or similar will only bring benefits!

How would you recommend performing that failover, when Elastic Load Balancers only cover a single region? If you host your own load balancer it'll have to be hosted somewhere, and DNS failover seems unpopular [1].

[1] http://serverfault.com/questions/60553/why-is-dns-failover-n...

DNS failover is fine, even if its manually triggered. We are almost two and a half hours into this network event, so even if you set a 5 minute TTL on your records and it takes you 15 minutes to respond, you are still way ahead of the game.

The idea that some DNS resolver somewhere caches records longer than TTL is mostly a myth. This is the case rarely on some mobile networks and networks behind satellite uplinks, but they generally have larger network issues anyway.