Hacker News new | ask | show | jobs
by fizx 4573 days ago
I think Antirez is simply doing a master/slave system with async replication. This is what mysql, solr, elasticsearch, kafka, etc do. This is not an unusual model at all.

In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick at most two).

There's a bunch of people who've made this choice, but why? C incurs a synchronization cost. A means that you have to reconcile different writestreams. If you want consistent semantics in a very fast database, you can't pick either. So you end up somewhere in the middle of the triangle.

The consequence of picking zero is that you'll lose a time window of data roughly proportional to the replication lag when the master fails/partitions. There are many applications for which bounded data loss is a perfectly reasonable paradigm.

4 comments

I assert that this design actually allows for arbitrarily long windows of data loss, but I haven't verified that the implementation matches my understanding of antirez's WAIT/failover algorithm yet. Pretty sure though.
If you read the Redis Cluster documentation, the data loss is acknowledged, even explaining what are the most obvious failure modes where this happens. WAIT does not provide strong consistency either, and is actually not even documented in the Cluster doc. However WAIT as it is makes the possibility for data loss less likely.

Once everything in the Redis Cluster design and documentation is about trading consistency for performance, I would expect the system to be analyzed for what it is: is it good at providing weak but practically usable consistency and performances? In short, it does respect the design intent?

Saying again and again that it features not consistency is a sterile exercise.

The strongest guarantee needs to be how quickly the system catches failure to recover or halt.
In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick at most two).

That might be true, but describing something in the CAP framework is not the only salient thing you can say about a distributed storage system. It is characterizing the failure modes.

You also have to think about what happens in the normal case. In the normal case you have a consistency vs. latency tradeoff. People have written about this, but unfortunately I don't think this broader way of thinking hasn't the attention it deserves:

http://dbmsmusings.blogspot.com/2011/12/replication-and-late...

http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-an...

"If you examine PNUTS through the lens of CAP, it would seem that the designers have no idea what they are doing (I assure you this is not the case). Rather than giving up just one of consistency or availability, the system gives up both!"

"The reason is that CAP is missing a very important letter: L. PNUTS gives up consistency not for the goal of improving availability. Instead, it is to lower latency. "

I was just making a point since even people who have a better average understanding of distributed systems, can't quite put their finger on what Redis Cluster's failure modes or guarantees are, perhaps a bit more of clarification is needed.

Even you are using "I think antirez is simply doing..."

Not arguing the design is not useful or it is a bad design, Savlatore's code is great, I think it is just a matter of more testing, docs or a blog post.

> In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick at most two).

No, he picked P, Partition Tolerance. You can't not pick P. Unless you assume a never-failing network with never-failing nodes.