Hacker News new | ask | show | jobs
by kijin 4741 days ago
> Q: What about data invalidation if I move servers, change the config etc.

> A: It's minimum. At least better than: Node = Hash(key) MOD N

Seriously?

Redis is not just a cache. Let me repeat that. Redis is not just a cache. Lots of people use Redis as a primary data store. If Redis is your primary data store, you can't afford to have any of your keys invalidated, ever. "Minimum" might be good enough when you're talking to Memcached, but it's not enough when you're taking to Redis.

Consistent hashing has been a solved problem for a long time if you can afford to misplace a few keys from time to time, which will happen every time a node is added to or removed from the pool. There are Redis client libraries implementing consistent hashing in nearly every language, and most of them work just fine if you use Redis as a cache or if your pool size never changes. Solving this problem again isn't particularly interesting.

What really would be interesting is a server that sits between Redis nodes and clients and intelligently moves keys from one node to another in the background so that no key is ever invalidated even when the pool size changes. I believe that project is called Redis Cluster or something. That might be worth an extra TCP connection. But right now, I'm not seeing why I should prefer Redis-Router to any tried-and-true client library with built-in lossy consistent hashing.

4 comments

> Redis is not just a cache. Let me repeat that. Redis is not just a cache. Lots of people use Redis as a primary data store. If >Redis is your primary data store, you can't afford to have any of your keys invalidated, ever. "Minimum" might be good enough when > you're talking to Memcached, but it's not enough when you're taking to Redis.

"minimum" is enough for the people who use redis for cache, since a lot of people use it for just, plain, simple, old "caching".

Nowhere is it suggested that this is suitable for uses where you need durability. In fact if you are looking for a library for consistent hashing, I'd assume you'd understand the drawbacks (invalidation on resize).

Have you a point other than ranting that this isn't a unicorn? It really bothers me that this is the top comment at the moment.

My point is that I don't see much reason to run a server the sole purpose of which is to perform consistent hashing. I have no complaints about using the client library as part of a Python program. Actually, apologies to OP because I didn't realize that this is like the first Redis client library in Python with consistent hashing built in.

But a standalone server to talk to other languages such as PHP? Why would I want to add yet another TCP connection, yet another point of failure, and yet another protocol to my software stack when PHP's very own Predis, for example, does consistent hashing just fine while talking to Redis directly? Many other languages like Ruby, Java, and C#/.NET also have Redis clients that support consistent hashing. Sorry Pythonists, everyone else has been having fun with turnkey consistent hashing for 3-4 years already.

Lossless sharding came to mind immediately as a possible benefit of a middle layer, because Redis users have been asking for something like that for ages. When someone says "Redis" and "sharding" in the same breath, I'm sure a lot of people will think "Finally, a way to distribute my larger-than-RAM dataset across multiple machines!" After all, durability is a big deal when it comes to Redis. I'm sorry if my comment came across as rude, but I was honestly quite disappointed because my expectations were probably too high.

What you're describing sounds a lot like mongodb - it has the server (mongos) that sits between the client and the upstream databases (mongod) and stores metadata about where data is located, and it moves that data around based on shard keys.

Unfortunately it's the weakest part of mongodb.

okay, tell me a python library comes with consistent hashing?

(for the record = HASH(key) MOD N is not 'consistent hashing'.)

redis-router is just a library that wraps redis-py with consistent-hashing. nothing more.

I use it in production heavily since it solves the client-side sharding problem for me. When I wrote this, there was no trustable client library comes with consistent-hashing.

I don't know what do you want to see actually. "saving the world" is a todo though. wait for the new releases. you might like it. :)

> okay, tell me a python library comes with consistent hashing?

Nydus uses the same Ketama algorithm that you use, but I suppose it might not have had that feature when you started to work on Redis Router.

Also, pretty much every up-to-date client library in nearly every popular language, such as PHP, Ruby, C#, and Java.

I'm not complaining about the fact that you created a neat Python library for Redis. My complaint is about the standalone server feature. I don't see why it's needed because nearly every popular language has native libraries that implement consistent hashing (not HASH MOD N), often using the exact same algorithm you're using. So I went looking for bells and whistles that might justify the standalone server, and unfortunately I found none.