Hacker News new | ask | show | jobs
by alxv 3276 days ago
You are missing a crucial piece here to have consistent hashing: you also need hash the names of the servers. With consistent hashing you hash both the names of the requests and of the servers, then you assign the request to the server with closest hash (under the modulus). With this scheme, you only need to remap 1/n of the keys (where n is the number of servers).
2 comments

You're kind of right. You can also use something like jump consistent hash [0] which only requires you to have a consistent ordering of the hosts where you're sending the information. We (Facebook) use something similar for our caches. It requires a linear array of hosts but you've already got that if you're load balancing.

[0] https://arxiv.org/abs/1406.2294

That makes a lot of sense, thanks.

Better consistent hashing means that existing servers don't have their caches invalidated, but the new servers that were just added start with empty caches anyway so are fielding all uncached requests. Hopefully the bottleneck is actually with some shared layer behind it (a database or something) otherwise I guess you'd need to come up with a more complex way to slowly distribute more traffic to the new nodes.