|
|
|
|
|
by alblue
3787 days ago
|
|
Power outage in DC brought many machines down. Redis clusters failed to start owing to disk issues (not cleanly unmounted?). The reboot of remaining machines uncovered an unknown dependency on the machines needing the redis cluster to be up in order to boot. There were other learning points such as immediately going into anti DDoS mode and human communication issues that didn't realise or escalate the problem until some time after the issues started occurring. |
|