|
|
|
|
|
by hueving
3393 days ago
|
|
>but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure. I don't buy this. I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services. You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact. Guess when Amazon does maintenance? That's right, you don't know and one screw up can mean instances in "degraded status" (a.k.a. you might as well terminate it and launch a new one) or all of S3 is down during critical business hours. Of course your own hardware in a single data-center is going to be exposed to high probability of failures, but that's the equivalent of using a single instance in EC2 (which I have lost two of in the last 7 years of managing 15 or so of them for a small company). I will admit that it takes strong ops skills to maintain high uptime on your own hardware, but that's just due to a lack of good open source tooling in this area. I would rather see a movement to improve tooling rather than continue to boost the stranglehold the public cloud is putting on everyone. |
|
Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.
Yes, with enough money you can match Amazon for uptime or scalability or whatever metric you prefer. For the same money you can probably buy triple the capacity in Amazon or your preferred cloud provider, so this is mostly a game for people with really deep pockets, really large scale, or really poor budgeting.
> You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact.
How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.
Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.
> all of S3 is down during critical business hours.
I have trouble believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime. If you stand up a fairly complex system comprised of a number of loosely-coupled services, you're going to end up experiencing some outages, because you'll face the same challenges as Amazon and those guys aren't idiots. You'll lose your message queue due to a bug, or you'll lose a network switch and realize your failover takes 30 minutes to complete instead of the 5 seconds you hoped for, or you'll accidentally DDOS a subsystem when exercising a failover or a system upgrade, or something else. Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.