|
I’ve been running platform teams on aws now for 10 years, and working in aws for 13. For anyone looking for guidance on how to avoid this, here’s the advice I give startups I advise. First, if you can, avoid us-east-1. Yes, you’ll miss new features, but it’s also the least stable region. Second, go multi AZ for production workloads. Safety of your customer’s data is your ethical responsibility. Protect it, back it up, keep it as generally available as is reasonable. Third, you’re gonna go down when the cloud goes down. Not much use getting overly bent out of shape. You can reduce your exposure by just using their core systems (EC2, S3, SQS, LBs, Cloudfrount, RDS, Elasticache). The more systems you use, the less reliable things will be. However, running your own key value store, api gateway, event bud, etc., can also be way less reliable than using their’s. So, realize it’s an operational trade off. Degradation of your app / platform is more likely to come from you than AWS. You’re gonna roll out bad code, break your own infra, overload your own system, way more often than Amazon is gonna go down. If reliability matters to you, start by examining your own practices first before thinking things like multi region or super durable highly replicated systems. This stuff is hard. It’s hard for Amazon engineers. Hard for platform folks at small and mega companies. It’s just, hard. When your app goes down, and so does Disney plus, take some solace that Disney in all their buckets of cash also couldn’t avoid the issue. And, finally, hold cloud providers accountable. If they’re unstable and not providing service you expect, leave. We’ve got tons of great options these days, especially if you don’t care about proprietary solutions. Good luck y’all! |
AWS (and others) make egress costs insanely expensive for any startup to consider leaving with their data, also there is constant push to either not support open protocols or extend /expand them in ways making it hard to migrate a code base easily.
If the advise is to use only effectively use managed open source components then why AWS at all ? most competent mid sized teams can do that much cheaper with a colo providers like OVH/hetzner.
The point of investing in AWS is not outsource running base infra, if we should stay away from leveraging the kind of cloud native services us mere mortals cannot hope to build or maintain.
Also this avoid us-east-1 advice is bit frustrating, AWS does not have to experiment with new services always in the same region,it is not marked as experimental region or has reduced SLAs , if it is inferior/preview/beta than call it out in the UI and contract, what about when there is no choice? If cloudfront is managed in us-east-1 and we shouldnt now use it ? Why use the cloud then ?
if your engineering only discovers scale problems at us-east-1 along with customers perhaps something is wrong ? aws could limit new instances in that region and spread the load, playing with customers like this who are at your mercy just because you can is not nice.
Disney can afford to go down, or build their cloud, small companies don't have deep pockets to do either