|
|
|
|
|
by rednixion
2515 days ago
|
|
1: Sounds like they are saying that until the changes to remove the cluster out of the request path, failover cannot be trusted so the only immediate action that can be taken is to disable the automatic failover logic and prevent any live maintenance jobs needing to take it offline from running. Not unreasonable depending on the timeline to implement the changes when comparing the risk of a cluster killing hardware failure (assuming that there is at least a one node tolerance) vs behavior inside of your buy/sell pipeline being "undetermined"; no failover is probably a great way to keep the sprint work priority at the appropriate level. 2. Loadtesting? Accurate end-to-end loadtests are painful to bootstrap from nothing and requires an environment that can be expensive, worth their weight in gold depending on how critical downtime is or if everything in your platform is pulling from the same resource pool(ie appliance based deployment). |
|