Hacker News new | ask | show | jobs
by antirez 4265 days ago
Hello Igor,

basically we can rule out synchronous systems: product mismatch from the POV of Redis. We are left with AP systems. AP systems trying to get as good as they can from the POV of Availability, and the consistency model provided (even if not "C"), do these two additional things that Redis Cluster is not able to do:

1) They are available in the minority partition. Redis Cluster has some degree of availability, but not to that extend. True "A" means, serve queries even if you are the only node left.

2) They provide more write safety, unless you use it with last-write-win strategies or alike.

"1" is actually related to "2". If you merge values later, you can accept writes even if the node has no clue about what is the state of the rest of the system.

About "2", if you check around, you'll immediately figure out that there are no systems available that are AP and can easily model Redis data structures with the same performances, and having the same number of elements per value. This is because to merge huge stuff is time consuming, requires a lot of meta data, and so forth.

Now there is another constraint, that I don't want a system with application-assisted merging. For example recent Riak versions provide data structures with a default merge strategy, so we have examples of similar stuff. I can assure you that there is either meta-data needed to merge, that would make Redis a lot less memory efficient, and sometimes, there are simply no good strategies. For example take the "Hash" type. There is a partition and in one node I set as "name" field of user 1234 "George" and in another node "Melissa". There is no better merge strategy than last-write-wins, basically.

Now, that said, there are certain things that don't add major overhead, AND, improve write safety. For example if the connection is declared as "SAFE" we could cache all the SADD commands (and others having similar properties) until they are acknowledged by all the nodes serving this key. Unacknowledged writes are retained as a "log" of commands, and are re-played when the node is turned into a slave of a new master.

So there are plans to improve the consistency model, but it is unlikely that we go towards the Riak model, but the good thing is, there are stores like Riak that are exploring this other way of doing data structures without application assisted merge functions, with per element meta data and so forth.

IMHO what there is to ponder is that at the end of the day, is that a distributed database is the sum of what it offers when there are not partitions, and what it offers when there are partitions. Often to improve the latter requires to give up something in the former: simplicity, space efficiency, data model freedom, ...

1 comments

Thanks for the explanation. Is it possible to add something like "this record was not merged cleanly" flag? Basically, from what I understand without sync replication you cannot theoretically have a system that does the correct merge in all use cases. MySQL would deal with this by letting two replicas contain different data until you scan for this in your application code and detect it. Other systems (I think Cassandra) will attempt to merge things in the background after the partition is gone, but with strategies like "last write wins" or even versioning you can still get bad records. I am thinking of a system that can do whatever strategy you choose, or even just "last write wins" but warns you that a write on this key was lost. That way you can either ignore it if that's appropriate (some types of sensors for example, that frequently update values) or you can bubble this error up your application code until a user can fix it. On the other hand I definitely can see this as a performance issue so it might be more appropriate for a datastore that is aiming for high consistency while still allowing AP mode.