Hacker News new | ask | show | jobs
Redis Cluster Tutorial (redis.io)
126 points by anuragramdasan 4583 days ago
6 comments

Great post. I uploaded Salvatore keynote from Redis Conf 2013 today. http://blog.togo.io/redisconf/a-short-term-plan-for-redis-by...
Thanks for the link. I have recently started exploring too much into redis at a source code level and its incredibly fascinating.
It's also beautiful code. It's a great project to learn from.
I have a few serious concerns: Suppose have nodes A,B,C,D running with slaves and you have a user u1 start retrieving and storing data. The odds will be high that the user u1 stores keys on all nodes A,B,C,D since all the keys are based on a numerical hashed slot.

1) The user u1 will end up connecting to all nodes. Thus as you scale adding an "E" you always end up connecting to each cluster having the same amount of connections. Example, let's say A,B,C,D have 5k connections open and you add "E". Now "E" has 5k connections. Now you are blocked by the number of requests per second and no amount of hardware can save you.

2) Let's say B and B1 die. (It happens.) Now your system is completely out. All 5k connections are blocked. It would be nice if A,C,D,E continue to run.

I usually get around things like this using an index server between user u1 and A,B,C,D. When u1 starts retrieving keys the system needs to be designed such that u1 only connects to single node for it's keys.

How do you get around 1,2? Do other people use index servers?

Hello, I'm not sure I totally get 1 and 2, maybe sometimes you write "cluster" instead of "node"? Btw I'll try to reply.

1) The amount of connections is not very relevant. You can set a max limit if you want, like with the redis-rb-client that will close old connections before opening new ones if the limit set by the user is less than the number of master nodes. However this is not recommendable as actually what is an idle connection? Just an entry in a data structure of the kernel. So it is not an issue for every client running in a given box to have multiple connections to multiple nodes.

2) The limit in the partition resistance in Redis Cluster is a fundamental design trade off, it means no need for a "merge" operation and that in a given moment only one node is responsible for a given key. I think that in the case of Redis most of the times less partition tolerance in favour of a simpler consistency model is what users need.

EDIT: a clarification on "1", what I mean is that every connection will be active only for the subset of keys it holds, so basically the traffic of every client is split across all its connections. Similarly every node of the cluster only performs a percentage of work proportional to the number of hash slots it holds compared to the full set of 16k hash slots.

EDIT 2: it is not clear from your point "2" if you understand you can have B2, B3, B4, BN if you want.

such that u1 only connects to single node for it's keys. Do other people use index servers?

The redis cluster client in your programming language will know the entire current redis cluster state. The redis cluster client will automatically place (and read) keys directly to (and from) the correct instances. The direct reading also reduces latency found in some other cluster implementations where you only have the option to proxy your requests. You as an application programmer never have to worry about multiple servers.

When your redis cluster client connects to the cluster, the redis client reads the current key slot to server map, and uses that for every action (with updates as necessary as failures/promotions happen).

Let's say B and B1 die.

Good example, but you should probably be running with at a minimum B, B1, and B2 (and three entire copies of A, C, D as well). You have to design your deployments in concert with how your underlying architecture works. Redis cluster instances do not have to take up an entire machine. Run a couple masters and a couple replicas per host. Running a cluster with 4 masters means you should run with 8 replicas (minimum). See the math section of https://github.com/mattsta/redis-cluster-playground for more examples.

It's worth stepping back and defining "Cluster" too. A "database cluster" has no meaning by itself. It doesn't specify redundancy, how availability works, or how replication works. A Riak cluster has nothing to do with a Redis cluster. You have to learn the operations on each type as you go.

About that, redis-trib, the Redis Cluster config utility, while still pretty basic is already able to allocate masters and slaves in different IPs, so if you give it six nodes like 10.0.0.1:6379 10.0.0.1:6380 10.0.0.2:6379 10.0.0.2:6380 10.0.0.3:6379 10.0.0.3:6380, you can expect a master for each IP address, and replicas always allocated in nodes where there is NOT its master.
In general I think you're right but I'm not sure why you would have 5,000 connections to each node. There isn't any reason I can think of to maintain more than 1 connection per client (application process) to each node. You don't gain anything by 'connection pooling' with Redis so it's not like each app server would want to keep a 100 connection pool around. I guess you could have an issue if you have 5000 application processes all wanting to connect to a single redis cluster, but I'd argue that you'd be well past the point of needing to break your system into services by then anyway.
If Redis would now just use names for databases instead of numbers and get rid of the limit of number of databases... Managing multiple application instances (production, test, staging for multiple countries) using a single shared Redis instance but requiring separate databases is a huge pain.
Numbered databases are (de facto) deprecated. Run separate processes, instead.

(Redis is single-threaded, anyways. You probably don't want to run production and staging on the same Redis process.)

Still not a suitable solution IMHO. Now you have to deal with multiple ports, startup scripts etc etc

Every other database on the planet has database names pretty much.

I LOVE Redis but this is just a bonkers limitation IMHO...

None of the reasons why many traditional databases have multi-tenancy support was ever relevant for redis.

And in a world that increasingly moves to containerization you should really look into adding automation to your admin workflow (e.g. ansible). 'port' and 'startup script' should become 'service' and 'template' in your mind.

Now you have to deal with multiple ports, startup scripts etc etc

That's just what happens when you live with production systems. Not every service in the world gives you a dev/staging/production server with "curl github.com/my-bespoke-database/install-and-run.sh"

The last place I worked had 60 physical mysql servers. We had to deal with ports and servers and startup scripts quite a bit.

With tools like Chef, Puppet, Pallet, Ansible, this is a unbelievably solved problem.
I have seen this explanation and frankly I find it ridiculous. Multiple databases as a dictionary layer?
If you have separate stacks for each application instance why wouldn't a separate instance of Redis be part of that stack too? Do you really want some buggy code overloading test-redis and bleeding into production?

More generally, you can always use key-prefixes if you want a name-based way to segregate data in the same redis+db instance.

Along this line I would find named Lua scripts useful, so multiple clients could access script functionality without uploading themselves, or having to retrieve the SHA1 from a dictionary.

This idea may of course be contrary to the expected use of Lua scripts in Redis, and an attempt for the RDBMS part of my brain to look at Lua scripts as stored procedures.

One thing I'd love to see (perhaps in a future iteration?) is a way to have overlapping ranges for redundancy, instead of mirrors.

The big benefit to this is when balancing write-heavy workloads, when the slave would not be getting its share of the total load.

I believe this could be accomplished by running a second DB on each machine as a slave to the next in the cycle, and that's what I'll probably try when we scale up to needing a cluster. But having it be a supported scenario (or facilitated somehow) would make me feel much better.

(edit: Just a feature request of course, exciting to see this release! Thanks for all your hard work antirez!)

I wish redis had synchronous write option to guarantee that when a write response is received that all nodes in the cluster have also received a copy. Even if this was just an ack that data is replicated into memory instead of disk. Sure there are cases where you'd rather operate with async, but really I would prefer to put haproxy in front of 3 - 6 redis nodes and let it load balance both reads and writes similar to galera cluster with keepalived and a vip I would not need to be concerned with split brain assuming synchronous writes...
That's in the plan, it's just not implemented yet. (It's implied under the Note: Redis Cluster in the future will allow users to perform synchronous writes when absolutely needed.)

Something like:

   SYNC 3
   SET key value
   [replies with success only after it's ACK'd on three cluster instances.]
I would definitely take slower write speed to have this extra assurance of high availability with data consistency... as it is today - it's pretty hard operate redis in a HA way even with sentinel... I'm sure others have figured it out but for me the best I have is a 30 second window when sentinel is switching the backup slave to master and either writes are lost haproxy can hold but it works sometimes... I can imagine with synchronous replication we'd have slower write speed but at least failover could be sub-second and possibly zero downtime... using a vip/keepalived - Still redis is such a good datastore, it's difficult to imagine a world/service without it. for what it's worth this looks to be the best setup i've found: http://failshell.io/sensu/high-availability-sensu/ would be interested if others have other setups that possibly work better...
Hey, going a bit off topic about Sentinel, this should not happen. Sentinel + Redis + Clients as a distributed system is not able to guarantee strong consistency with arbitrary partitions clearly, but in the scenario you describe, that is, the master crashes or is partitioned away alone, the window should be like a fraction of a millisecond (the replication link latency) most of the times.

EDIT: to clarify:

1) Sentinels will tell clients master is A.

2) A fails.

3) Sentinels will still tell it is A before failover.

4) Failover happens.

5) Sentinels will finally tell master is B.

could be something that haproxy is causing - when switching masters... maybe I need a script that handles switching the vip when sentinel detects failure in master instead of haproxy with a backup... btw - thank you for redis.
You shouldn't need haproxy in the loop at all.

If your redis client is Sentinel-aware, you ask your client directly "give me the redis master server" and it asks Sentinel which server is master. The only ip address involved for your configuration is the address of the sentinel(s).

haproxy will take 30 seconds to detect your failure. Sentiel will notify you immediately. Sentinel is the one determining the exact state of the master election, and it'll be very chatty when a new master replaces the old one.

seems like redis cluster is about turning redis into riak
Redis and Riak are from the point of view of distributed systems extremely different beasts. Both are useful in my opinion but the semantics is very different since Riak is basically an implementation of Amazon Dynamo paper. Dynamo-alike stores like Cassandra and Riak are key -> small_value stores that are available even under severe partitions, and where there is not a single node responsible for a given key. Application-assisted merge operations are used in order to merge values later when the network partition heals.

Redis have instead a master - replicas setup that can't usually resist severe partitions. In general only the side of the partition with the majority of masters survive assuming that there is at least a slave for every master that is not in the majority partition.

In practical terms this means that a few nodes going away usually result in the cluster being still available (unless you are really unlucky and all the replicas for a given hash slot go down at the same time), but that under severe partitions the system is likely unavailable.

The tradeoff makes Redis cluster able to work without help from the application, and without merge operations: this is very desirable because of the kind of complex-to-merge and big (millions of elements are common) data structures that a single key can hold in Redis.

In what way?

Some variant of clustering is a very common feature among many different database platforms.

Redis adding clustering does not make it a step towards turning it into Riak any more than it is a step towards turning it into MySQL [1]

[1] http://www.mysql.com/products/cluster/