Hacker News new | ask | show | jobs
by warunsl 3 days ago
> You simply take one of those redundant clusters and migrate one half of it over.

For that half you are migrating, you are essentially operating without redundancy. If these are serious production workloads, the tradeoff is not as simple as you make it seem.

1 comments

The way a cluster works is you have a giant pool of resources. Say, 33 - 50% larger than the workload. The workload is a dozen VMs. The cluster is 8 giant compute servers and two giant storage servers acting as one giant compute and storage unit. For redundancy you have extra clusters laying around with no workload, but they are added as failovers.

Normally, if one server on a production cluster goes down, the other members of that cluster seamlessly will take over. This is where the extra capacity comes in. You don't migrate the workload to another cluster. You just lose overhead capacity. If you lose too much then you start migrating parts of the workload to the failover. Not the entire thing.

You usually don't have to use your redundant cluster at all until it's time to rebuild the failed cluster. You might pick one of these spare clusters you keep around for redundancy to migrate all or part of the production workload to while you fix the production cluster.

When doing a big migration you take a percentage of your redundancy and convert it to the new environment. This is your staging environment. Once it is capable of doing work, you slowly grow it out and shrink the old environment at the same time.

This is basically how HA works with VxRail. I buy more VxRail than I will actually host because if a node fails then the VMs can be moved - sometimes not always without downtime but no loss. If I run out of HA nodes or start running low on capacity, then Aria will start sending alerts.