Hacker News new | ask | show | jobs
by rdtsc 3988 days ago
Based on experience is there a common bug or scenario that you see overlooked often? Like say what happens during the transition between leaders, or handling multiple failures (multiple netsplits..)?
1 comments

I can't really identify a common problem. Things I've seen include:

* After a complete, planned shutdown, neither server is happy to start until it sees the other one online. In the end, neither ends up booting. * A failover occurs, at which point you find out the hard way there is state being stored in a non-replicate file. I've seen this with several different Asterisk HA solutions in particular. * A failover occurs, and non-database aware storage snapshots leave the redundant server with a non-mountable mirror of the database.