|
|
|
|
|
by rdtsc
4573 days ago
|
|
To summarize quickly for those that didn't read the article, this focuses mostly on the new "clustering" aspect not on the "classic" single Redis server (if you wish). I think it is important to be honest with the users and make it clear how and what happens behind the scenes, how data could be lost. And Salvatore has done most of this, maybe just make it a bit more explicit, as there still seems to be some confusion around. All this is in light of 2 things -- 1) With the popularity and amount of talks and churn around distributed systems these days, people sort of expect a point on the map in the CAP triangle. So just saying we kind of do this and we provide some C, a little A and a dash HA was probably ok 5 years ago, now it needs a bit more definition, 2) In light of other database systems misleading users about what it could provides (you know which one I am talking about) and having resulted in lost data, there is a bit of apprehension and a higher bar that needs to be met in order for a db product to be accepted. One good thing that came out recently is NoSQL database writers/vendors pushing for more rigorous tests. Tests that run for weeks and months. Consistency tests, network partition tests as run by Aphyr. It is a very good idea those things are talked about and defined better. |
|
In CAP theorem terms, Redis has picked zero (remember CAP theorem is pick at most two).
There's a bunch of people who've made this choice, but why? C incurs a synchronization cost. A means that you have to reconcile different writestreams. If you want consistent semantics in a very fast database, you can't pick either. So you end up somewhere in the middle of the triangle.
The consequence of picking zero is that you'll lose a time window of data roughly proportional to the replication lag when the master fails/partitions. There are many applications for which bounded data loss is a perfectly reasonable paradigm.