|
|
|
|
|
by jedberg
5573 days ago
|
|
We're currently in the process of replacing every one of our hosts with new OS versions. As we do this we are in fact going to the EBS based instances. Those instances actually show the same problems, but they aren't too bad, because once you boot them, you don't need the root vol that much (that's what the instance storage is for). |
|
Q1. I still don't get the use case for db storage on ephemeral storage.
Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.
Some Comments: It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse
I talked with Ketralnis several year ago and know how many VMs you were running back then. Pretty sure your not too far off from that count even today (even if 2x).
You can still virtualize on a good set of dedicated hardware to emulate your current 'network environment' to get you up and running in the near term _asap_. Obviously you'd build out of that vm environment (with your load) as the days go by. Seriously look into a parallel switch over though.
If EBS is in fact a huge issue as has been shown, you really may need to start migrating off unless you want dedicated employees monitoring system health on AWS. Eventually if problems continue that is what will happen, with no time left to even develop automation... And why automate on a pile of instability?
Don't forget that the more VMs you add with this high failure rate increases soft management costs and will eventually eat into your development time...
I don't work for Rackspace (I think they're quite expensive), but you guys might benefit from this level of care to focus on the real issues.