Hacker News new | ask | show | jobs
by koolba 1406 days ago
> One of those limits is that you really, really, really don't want to go outside of RAM. Think about what is stored, and be sure not to waste space. (It is surprisingly easy to leak memory.)

You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.

> And third, it is worth spending time mastering Redis data structures.

Bingo. The true secret to properly using Redis: understanding the big-O complexity of each operation (…and ensuring that none of your interactions are more than logarithmic).

3 comments

You can have massive amounts of RAM these days. You’re sooner to hit big-O limits from bad architectural decisions than run out of memory. If you do get to that point you likely have enough value in your usage to justify scaling out further and sharding.

Absolute disagreement.

It is very easily to accidentally leak a few hundred MB per week in a busy Redis system. The code will look and work fine...at first. It is correspondingly hard to track down and clean up the leak a few months later. (Particularly if there are multiple such to track down.) Yes, you can go for years just buying larger and larger EC2 instances. But that will also come with a shocking price tag.

I know of a number of organizations that this happened to. And pretty much every bad Redis story I hear about had this as a root cause. That is why I brought it up as an important consideration.

Yes, this matches my experience.

Redis excels as a memcached alternative with some useful operations. Where people get into trouble with redis is treating it as a persistent data store, when despite it's ability to replicate and persist, redis has some constraints you need to work within. At best think of redis as something that can hold a materialized view, but where it can become corrupted at any random time, so you'll need the ability to rematerialized it from something else. And second, you absolutely have to be conscious of how close you are to ram limits.

Redis is production-ready and it has a lot of features to help you track down problems with either memory or CPU usage. For example: `redis-cli --bigkeys` will help you find the very large keys. For smaller keys that occur too often, sampling a few hundred keys should be sufficient to help you find what type of keys are taking more space than necessary.

Once you get the Redis database designed well, there is a lot of things you can do before hitting the limit where you can't install any more RAMs onto a new machine. For example, there are no more than a billion .com domains out there. Say a single record takes 100 bytes on average, consisting of the domain name and a glue record pointing to the IP of its authoritative DNS server. Then it takes just 100GB of memory to store enough information to handle all queries to .com domains in the world. It's not so hard to obtain a machine with 768GB memory these days, and 2TB machines are not uncommon.

And if you worry about the price tag - don't use EC2. You can rent a 1TB RAM dedicated server at https://www.hetzner.com/dedicated-rootserver/ax161/configura... for $600 per month. At Scaleway you can rent it for $1000 per month: https://www.scaleway.com/en/pricing/?tags=baremetal,availabl.... AWS is notoriously hard to be made cost effective.

You can also "leak" rows in a traditional RDBMS or even a filesystem. Why is this particular notable for Redis?
Redis starts to have issues at high scale, even on sophisticated hardware, that can be quite difficult to debug without a lot of additional effort and storage. It’s not just memory, but odd behavior (e.g. randomly dropped connections) with a lot of connected clients, or hot keys/nodes in a cluster configuration, etc.

These issues can exist in any system, but in my experience it’s especially tough (relatively) to identify and diagnose them with Redis. Once you add lua script usage it can get even worse.

A traditional RDBMs or filesystem is designed for high throughput and concurrency, even if some tasks are blocked on data. Additionally both have options to partition steadily growing things. If needed with old partitions being moved to tape backup while the server continues running.

Redis is a single threaded program acting against RAM whose philosophy is that it does things fast then moves to the next job. If it needs to access memory that got paged to disk, the whole server stops and waits to get it. Nobody can do anything.

Because Redis doesn't have to deal with locking and concurrency, it can run much faster on the same resources. But when concurrency is required, it is stuck because it doesn't have it.

> You can have massive amounts of RAM these days.

True, but I am finding that balancing CPU and RAM can be tricky. Slapping 128GB on a 1-core machine means you quickly have CPU limitations.

Redis is single-threaded and will have no problem saturating a 10G NIC with a single socket.
My concern is how fast it takes a CPU to scan through all of that memory.
What "scanning"? That's not how memory access works in a K/V store, and Redis does very little work that demands much of the CPU.
There are workloads that will saturate a redis instance's CPU: using it as an LRU cache, eventually you will hit the configured memory limits and adding new keys will require finding old keys to delete. Eventually it may also require redis to do memory defragmentation which can be fairly intensive.
> There are workloads that will saturate a redis instance's CPU

I might imagine this scenario if you're excessively using smembers and a few other slow ops, but I have yet to see CPU issues outside of bad eval's.

> require finding old keys to delete

LRU/LFU eviction is not particularly CPU intensive.

> redis to do memory defragmentation which can be fairly intensive

Active defrag has relatively negligible overhead, and assuming jemalloc even more so.

> understanding the big-O complexity of each operation (…and ensuring that none of your interactions are more than logarithmic).

This is a good idea, maybe a prompt for another post.