|
|
|
|
|
by k1w1
4058 days ago
|
|
As an AWS user this type of thing gives me cause for concern: At 2015-04-01 00:00 UTC, the Amazon EC2 "provisioned I/O" volume on which most
of this metadata was stored suddenly changed from an average latency of 1.2
ms per request to an average latency of 2.2 ms per request. I have no idea
why this happened -- indeed, I was so surprised by it that I didn't believe
Amazon's monitoring systems at first -- but this immediately resulted in the
service being I/O limited. A sudden doubling of latency can have dire consequences on any system. Knowing that such unexpected changes are possible makes it built trust in your environment, even if it is running fine today. |
|
This is why I don't use AWS for anything non-trivial, and I am wary of people who put critical infrastructure on it. (EG: I Don't care about netflix, that service can run on AWS fine, but coinbase, for instance, if I was their customer and they ran on AWS I would stop being their customer.)
Whenever AWS problems come up people talk about how "AWS is so much more efficient, you just outsource that stuff to the experts".
But that seems to imply that hosting on your own hardware in your own office is the only alternative. Of course we stopped doing that in the 1990s.
With AWS you have to know Linux and have ops people, that's true everywhere. With AWS you have the additional burden of learning the AWS APIs and learning how to use AWS, which isn't transferrable, so that's a higher cost. With AWS you have to architect around the limitations of the way AWS is built and your architecture becomes AWS specific if you use those APIS, so that's an additional cost. You don't need any less ops people, probably more, than going with another hosting service like Digital Ocean or Backspace. And if you go with something like Hetzner you pay 1/5th to 1/10th for machines with a lot more performance and local storage. (Though you get the additional latency of being located in Europe, if your primary customers are the USA.)
Of course, I'm also prejudiced. I worked at Amazon and saw how the sausage was made and was not impressed. When AWS was announced as "running on the same infrastructure that powers Amazon.com!!!" as if it was a feature, I cringed. Amazon.com was having outages of parts or major components on a weekly basis at that time. Much of AWS is actually running on bespoke software (so not actually tested by Amazon.com when introduced, though I'm sure portions have been moved over at gunpoint) ... which actually makes it worse. People were trusting their data to a service that pretended to be backing a major e-commerce site but was actually untested outside of the company at the time.
And what have we seen since? An unacceptable level of failures. (in my opinion, of course)
But people seem to be very forgiving. When it's happening everyone's in "how can we fix this mode" and then when it's fixed everyone forgets and goes back to thinking of AWS as always running.