Hacker News new | ask | show | jobs
by gtrubetskoy 3916 days ago
I'm curious about the "threaded redis" reference. Back a couple of years ago I built a Collaborative Filtering recommendation system that used Redis for graph storage and relied heavily on sorted sets to compute the recommendation right in Redis.

I really needed some kind of a parallelism and so I hacked together http://thredis.org/ (and then mostly for fun added SQL operations to it by linking it with SQLite).

Since then I've kind of abandoned this project and moved on to other things, but I still think that there is a valid case for some form parallelism in Redis. I had learned some tough lessons while hacking on Thredis such as importance of ordering locks, having retry strategies, and there are still bugs that can cause it to crash AFAIK, but the take away was that it's doable - I was a newbie at it, today I'd probably do a much better job. In my (not so scientific) testing Thredis was only slightly slower than Redis.

1 comments

I remember thredis very well! But it's different compared to what memcached does and Redis has plans for: memcached just threads the I/O part, not the access to the key space which is serialized via a mutex. However what you had in mind is also in our long term plans... and was addressed in another blog post here: http://antirez.com/news/93
The locks are becoming more fine-grained in memcached [1], so that should be less of a problem now.

It is possible to remove lock contention on the read path [2] if a concurrent hash table is used. This can be done while using an O(1) eviction policy that outperforms LRU [3].

[1] https://github.com/memcached/memcached/pull/97 [2] https://github.com/ben-manes/caffeine/wiki/Design [3] https://github.com/ben-manes/caffeine/wiki/Efficiency

NovaX: thanks for the interesting references. The point is, is it worth for memcached to avoid the global interpreter lock in the hash table with the number of cores currently deployed machines have? I would expect to see very little contention. The concurrent hash table looks a good idea for memcached, for sure to have a mutex per key would be likely an overkill in terms of memory usage. I'll try to read with care the links you provided, thank you.
As you said elsewhere, the network I/O is the primary bottleneck. There are a lot of different hashtable designs (so per-key locks not required), but fine grained locking of the table/LRU is probably enough. Since an in-app cache has a different perf profile, the latter two links summarize my work.
I wrote up a fairly detailed explanation of the Thredis locking strategy, it took me a while to realize that a lot fewer locks than it seems at first is needed: https://github.com/grisha/thredis/blob/c9207a373c3ff0c9fb70a...