Hacker News new | ask | show | jobs
by DanWaterworth 5180 days ago
What happens in the case of a network partition?
1 comments

Can't reply for the solution posted in this article, but well, I think this is one of the main design concerns. For redis-sentinel (I described it in another comment in this thread) the trick is that you place the sentinels where you want and select a minimum number of agreement for failover, so what happens depends on where you place the sentinels and the min agreement you configure. It's easy to have the desired behavior this way.
something to keep in kind is that sometimes, the redis server can't accept connections anymore because of limits etc...but the server is still serving old connections. so in that case, i think you don't want to just failover... the tricky part is to know if the server is really down
There is no sane condition in Redis that will make it not replying at all AFAIK, even if you set maxclients to 1 the next clients will have an error returned (and the connection closed ASAP). But yes, it is important to understand what down means. I think one of the safest things to do is "down == unreachable". So if you don't get any reply at all, for the configured amount of time consecutively, the server is down. And of course the other redis-sentinels have to agree for the fail over to start.
The gem has a configurable --max-failures option that can be passed to failover daemon. The daemon will only mark a node as being unreachable if it fails to ping that amount of times (default 3). This might be something that can be improved too, but it was meant to avoid false positives.