|
|
|
|
|
by aaronblohowiak
5022 days ago
|
|
If Github hasn't gotten their custom HA solution right, will you? Digging into their fix, they disabled automatic failover -- so all DB failures will now require manual intervention. While addressing this particular (erroneous) failover condition, it does raise minimum down time for true failures. Also, their mysql replicant's misconfiguration upon switching masters is also tied to their (stopgap) approach to preventing the hot failover. So, the second problem was due to a mis-use/misunderstanding of maintenance-mode. How is it possible that the slave could be pointed at the wrong master and have nobody notice for a day? What is the checklist to confirm that failover has occurred correctly? There is also lesson to be learned in the fact that their status page had scaling issues due to db connection limits. Static files are the most dependable! |
|