Hacker News new | ask | show | jobs
by danielmewes 3990 days ago
3 nodes are indeed required to have automatic failover working.

We recommend using 3 (or more) nodes and replicating all tables with a replication factor of 3. That way each node is going to maintain a full copy of the data. In case of a 1 node failure (with or without sharding, as long as the replication factor is set to 3), one of the two remaining servers is going to take over for the primary roles that might have been hosted on the missing server.

If you want to sustain a two node failure without any data loss, you will need 5 servers and also set the replication factor to 5. 5 is the lowest number that guarantees that if two nodes fail, you will still be left with a majority of replicas (i.e. 3 out of 5). A majority like that is required to guarantee data durability and to enable auto failover. If you are ok with losing a few seconds worth of changes and do not require automatic failover, even a 3 node setup can be enough to sustain a two node failure. In that case you will have to perform a manual "emergency_repair" to recover availability (see http://docs.rethinkdb.com/2.1/api/javascript/reconfigure/ for details), but most of the data should still be there.

In addition you can shard the table into 3 shards for additional performance gains. This is for the most part unrelated to availability and data safety.

2 comments

With data sharded across three nodes, why can't a replication factor of two handle one node's failure? Why does the replication factor need to be three?
That's a great question.

In our current architecture, we perform failover by selecting a new primary among the existing replicas for a given shard. We do not however add new servers to the replica list during a failover condition. Therefore, if you have a table with two replicas per shard and have a one node failure, you would only have a single replica left.

Currently we do not automatically failover in that case. We always require a majority of replicas to be present for any given shard before failing over the primary of that shard.

I believe we could relax this constraint in this specific case, and allow failing over despite only a single replica being left, as long as we still have a majority for the table overall. (I'm going to double-check this with one of our core developers...) Even if we did perform the failover, that would not restore write availability of the table though, since there wouldn't be a majority to acknowledge a write (unless you set write_acks to "single"). It could still be useful to restore availability for up to date read queries. We might add support for auto failover in this scenario in the future.

So, on a minimum three-node setup, three replicas + three shards per table are doable? Are there any caveats?
Yes, doable and desirable. There shouldn't be any caveats to this deployment.