Hacker News new | ask | show | jobs
by eloff 3835 days ago
GoshawkDB is unique as it allows you to configure the tolerance to failure independently of the size of the GoshawkDB cluster. For example, you could have a cluster of 5 nodes and require that it continues to operate unless more than 2 nodes become unreachable. Or you could have a cluster of 20 nodes and require that it continues to operate unless more than 3 nodes become unreachable. The only requirement is that if F is the number of unreachable nodes you wish to be able to tolerate, then your minimum cluster size is 2F + 1. You may of course choose to have a cluster size larger than this. Currently no other data store that I'm aware of offers this flexibility.

I fail to see the point of making your cluster unavailable before you've lost so many nodes that you no longer have a quorum. It seems odd to have a cluster that can handle e.g. 4 node failures, and take it offline after only 2. Why would anyone want a feature like that?

1 comments

It's in the next paragraph. The data has to have more replicas to be resilient to more failures. If you have a lot of data, storing 20 replicas is going to be really expensive and probably unnecessary.
Yup, on the whole, in order to increase performance, you don't want more and more nodes to be contacted. Remember that every txn has to have a minimum of F+1 replicas vote on each txn, so if F is really large, although they vote in parallel, you have greater chance of network delay and greater load on all machines. So GoshawkDB allows you to increase cluster size without increasing F, so getting you greater performance.