|
|
|
|
|
by StreamBright
3745 days ago
|
|
Disclaimer: I used to work for Amazon, does not own any AMZN anymore 1. AWS does explicitly tells you up front that smaller instance sizes come with smaller network throughput. This is well known and well communicated even when you browse the instance offerings. Doing 7x better for a 4 core instance is hardly relevant (depending on the actual CPU type though), being able to saturate your pipe would probably consume much of your CPU time and you could hardly do anything else on the box. You can prove me wrong on this one. Synthetic benchmarks are not really relevant for production use cases. A good read in the subject: http://www.brendangregg.com/activebenchmarking.html 2. On reducing OPS. You are implying that these OPSy things are not automated. You should ask your SRE co-workers about this one. For running a website this scale, you absolutely need to automate cases when the server is rebooted. Meaning, on shut down it needs to remove itself from the load-balancer or from the resource pool, and when it comes back it has to put itself back. Worst case scenario you can just terminate the instance and let auto-scaling do its job. All of these are completely human attention free operations in most cases, but I do understand that some smaller customers are not so advanced with automation and GCP might be optimizing for those clients. 3. I do not have any question, as I pointed out that in the article the author is talking about EBS while it might appear to the reader that he is talking about some sort of local SSD. 4. Great! I would like to know it too! We should petition together. :) |
|
1. PerfKitBenchmarker includes meaningful benchmarks for things like Redis, Aerospike, Memcache, etc. We expect GCE to score well on these when measured in terms of performance/$, and chunk of why we expect that is from superior network performance. Even small instance sizes tend to saturate their provisioned network long before they saturate provisioned CPU; GCE provisions more network (up to 2 Gbps/vCPU per our public docs).
This also applies to custom VM shapes. This allows workloads like memcache (which require very little CPU per request, typically) to be provisioned on small instances that still have relatively beefy networks with oodles of RAM with costs proportioned appropriately.
2. GCE handles instance failures differently from EC2. Certainly both platforms will have instance failures that cannot be solved with migration; this is absolutely something software stacks must work around. Live migration allows us to drive down the number of failure modes which cause an discontinuity in instance lifecycle, but obviously they cannot be eliminated entirely.
That said, when an instance in GCE fails it is by default restarted as quickly as possible (possibly on another host). To the guest this appears as an unplanned reboot. My understanding is that you can accomplish the same on EC2 by 'recovering' and instance[0], and that further you can automate this recovery with CloudWatch, but none of that is required on GCE.
I think we're in full agreement in terms of automating OPS, I'm just of the (obviously strongly biased) opinion that GCP is ahead in terms automating things on behalf of customers "out of the box".
[0]: I previously worked at Amazon, but in Retail at a time when the deployment tools for EC2 were... somewhat exotic. I lack experience with what the general best practices recommended to external customers is.