|
|
|
|
|
by paladin314159
3954 days ago
|
|
Redis is a great piece of software, and we leverage it for several uses cases outside of managing sets. For our use case, there were a couple of blockers that prevented Redis from being a viable solution: 1. It's tricky to scale out a Redis node when it gets too big. Because RDB files are just a single dump of all data, it's not easy to make a specific partitioning of the dataset. This was a very important requirement for us in order to ease scaling (redis-cluster wasn't ready yet -- we've been following that carefully). 2. When you store hundreds of GB of persistent data in Redis, the startup process can be very slow (restoring from RDB/AOF). Since it can't serve reads or writes during this time, you're unavailable (setting up a slave worsens the following problem). 3. The per-key overhead in Redis (http://stackoverflow.com/questions/10004565/redis-10x-more-m...). We have many billions of sets that are often only a few elements in size -- think of slicing data by city or device type -- which means that the resulting overhead can be larger than the dataset itself. If you think about these problems upfront, they're not too difficult to solve for a specific use case (partition data on disk, allow reads from disk on startup), but Redis has to be generic and so can't leverage the optimizations we made. |
|
Regarding the sets database, I had to solve quite a similar problem at the company where I work and instead of sets I actually chose to use the Redis HypeLogLog structure instead of sets because for near real time results you just need an approximate count of the sets / or their intersection and you don't need to know the specific set members. I just wanted to let you know that it works great for us for with doing intersections (PFMERGE) on sets containing hundreds of millions of members. If anybody is interested I can do a writeup about it.
Did you ever consider using that?