Hacker News new | ask | show | jobs
by nxb 3923 days ago
Is there a document that lists out Redis best practices like "Redis is very fast as long as you use O(1) and O(log_N) commands"?

Sure it's probably all obvious things, but it would be nice to have a checklist to skim over, to be sure I haven't forgotten any major consideration when designing a new system.

3 comments

Hello nxb (but this comment also addresses jnewland post). In the documentation we have, for each command, the time complexity, and in certain commands like KEYS an explicit mention about not using those in production.

However the best tool for debugging if there is something wrong with a given deployment is to use the LATENCY command. Once the latency monitoring is enabled, by calling LATENCY DOCTOR Redis creates an automatic report about all the possible sources of latency with a given application/setup by collecting and analyzing live data.

The latency reporting is explained here: http://redis.io/topics/latency-monitor

Moreover sources of latency and a quick list of things to check to make sure to avoid 99% of the problems are also listed here:

http://redis.io/topics/latency

The first section is a TL;DR synopsis listing everything very important.

In general the "slow log" feature, the latency doctor and reporting abilities, the software watchdog, and the documentation which explicitly provides time complexity for each command, should be quite a good amount of tools and informations in order to avoid problems. Obviously like with any popular tool like Redis, there is a percentage of users that will not read the doc and just deploy it. This is unavoidable but at least when they run into issues we try to provide them with the tools needed.

Those tools are certainly useful! In the recent production incident I mentioned, we we able to quickly use SLOWLOG to determine which calls were causing the latency spikes. Thanks deeply for providing all of these useful tools and docs.

> there is a percentage of users that will not read the doc and just deploy it

I wonder what this percentage is? I'd wager that it is higher than you might have anticipated, especially given the other comments on this thread.

As an engineer on a team ultimately responsible for the availability of a production service, it's my responsibility to ensure that the percentage of engineers that know the latency side effects of any Redis calls they make is near 100%. In the presence of such variable latency, any means of making that variability more obvious to all users of Redis would be a positive step towards happy users and operators.

I understand your point of view. Unfortunately I think Redis in this regard is a tool where to help is hard: I try to provide documentation but is the kind of tool that looks superficially so easy to use, yet you need some understanding in order to really use it effectively and deploy it safely. It's part of the fact that uses 1) uncommon tradeoffs and 2) is a set of commands without a predefined use-case, so there is tons to invent in the good and bad side :-)
This line jumped out to me as well. After a recent production incident determined to have been heavily influenced by several thousand O(N) commands being called against a list several orders of magnitude larger than normal, I'm thinking about this a lot.

I'm confident many other users of Redis are using O(N) operations in production against small datasets without knowledge of how much latency will be introduced by those operations when that dataset grows. This is exactly the kind of situation that makes me immediately skeptical when I find Redis in a emergent system design.

I'm considering what initially felt like a draconian means of remediation: using rename-command to rename all O(N) commands to BIG_O_N_$COMMAND to ensure everyone using them knows the possible impact and to allow for easy detection during code review and/or Redis latency spikes.

The more I think about it this approach, the more I feel that this should be the default mode of operation for Redis in production. SREs around the world would collectively save decades of time if the every engineer writing Redis queries to knew this fact by heart:

> Redis is very fast as long as you use O(1) and O(log_N) commands. You are free to use O(N) commands but be aware that it’s not the case we optimized for, be prepared for latency spikes.

> "Redis is very fast as long as you use O(1) and O(log_N) commands"

Well, like you say, to me that particular practice would be pretty obvious. More helpful would be a page that shows the time and space complexity of the various commands, akin to the following page for Python:

https://wiki.python.org/moin/TimeComplexity

Edit: Yeah, like the Redis documentation, lol. I think it says a lot for Redis that I almost never need to visit the docs, other than the single page that lists all the commands (back when it was red was the last time I actually needed to look at it). Once you know the commands, quickly reading the changelogs provides me with all I need. Anyway, all that to say I had no idea Redis already provided this complexity information, don't remember them having this back when I first used learned Redis (although maybe they did). Good to know.

You mean, akin to the redis documentation?

http://redis.io/commands/sort

"Time complexity: O(N+M*log(M)) where N is the number of elements in the list or set to sort, and M the number of returned elements. When the elements are not sorted, complexity is currently O(N) as there is a copy step that will be avoided in next releases."

this is actually in the command documentation: i.e. http://redis.io/commands/exists

> Time complexity: O(1)