Hacker News new | ask | show | jobs
by bkirwi 4265 days ago
In theory, Jepsen or a Jepsen-like system should be able to check any of these failure modes.

On the other hand, it sounds like Redis Cluster offers few hard guarantees; instead, it promises that failures should be rare 'in practice'. Which is a fine thing for a tool to do, of course, but it makes things less amenable to the kind of stress-testing Jepsen does -- since running inside Jepsen's little universe is about as far from normal operation as you can get. If you already know that a system can fail in a certain way, getting Jepsen to reproduce that failure tells you very little.

If you'd like to make this kind of testing possible, it would be useful to state as many 'positive' rules as possible, which Redis Cluster should always respect -- things like "if a majority of nodes are fully connected, they should always accept writes" and "an unpartitioned cluster should always agree on the same value" -- alongside the documentation on ways it might fail. This way, clients can be assured of the 'bare minimum' that the system supports, and tools like Jepsen can give you more useful information.

1 comments

Oh, there are definitely hard rules like that. For example a majority partition never accepts queries, and when there are no partitions at all Redis Cluster guarantees to converge on a single value for each key, and to a single view of the cluster configuration. I'll try to document better this things, but basically they arise from the simple algorithm that makes the configuration eventually consistent.
Neat; that should be very useful.

It's great to see that Redis has a official story for clustering / failover out; like you said in the post, the worst distributed systems are the ones you have to rewrite every single time. It's going to be interesting watching this evolve.