Hacker News new | ask | show | jobs
by Sinjo 3182 days ago
It's been a long time since I looked into this: is there now a way to configure a cluster of Redis instances such that you won't lose messages on node failure? If not, all the nice at-least-once delivery (or "effectively once" when you add message dedupe) you get with something like Kafka/Kinesis/GCP PubSub is gone.

If not, either people's messages don't matter /that/ much (which is fine, just not great for most of my usecases at the moment) or everyone's in for another round of "oh shit, where did the data go?"

Edit: Just in case we end up in CP vs AP datastore wars, please go read https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

At-least-once delivery requires neither CAP consistency (linearisability) nor CAP availability (any non-failed node must return a response in a non-infinite time), but is a very useful property!

2 comments

Hello, the streams have basically the same characteristics as any other Redis data structure, that is, from the POV of a local node, you can configure strong persistence on disk, but on node failures, you have basically different tunable amount of best effort consistency, it means that you cannot guarantee no messages are lost. So basically this means that you can:

1. Use the default asynchronous replication, and live with the fact (if the use case permits this) that on failover, the message did not yet received the slave that will be promoted.

2. Use WAIT to force synchronous replication to N slaves. This will not still make Redis ensure you in mathematical terms that the failover will pick a slave that received the message, under complex partitions, but narrow the real world failure models leading to losing data to more "unlikely" cases. Yet you have just best effort consistency but with better real-world outcomes.

So Redis streams will be good choice if one of the above is acceptable.

I forgot to add that with the Redis modules API for the cluster, it could be possible in the future to write a module exposing a CP version of XADD without changing Redis default semantics.
What is the consistency model of redis?

It sounds like anything can be lost in redis during normal HA operations even with WAIT pushing to a majority of slaves. Is that right?

Eventually consistent unless you're only using a single node. I believe that Redis itself commits to disk at various checkpoints in time, so if a fail happens, you're really only guaranteed to fail over into a pool of data that's consistent up to the last checkpoint of the node you're moving to.

EDIT: And as antirez said above, you can WAIT to force synchronization to all nodes, so you would be pretty likely to fail over onto a node that has n-1 messages if it didn't sync in time. That still isn't guaranteed however.

That's good to know. Thanks for the explanation!

Not currently useful to me, but I'm sure this hits the sweet spot for a bunch of people.

Last I checked, neither Redis Sentinel nor Redis Cluster were linearizable systems; you get neither C, A or P. Redis Cluster failed Aphyr's Jepsen tests back in 2013. I not sure what the current status is, but I don't think the fundamental architecture has changed since then.

With vanilla Redis master/slave replication, I believe the best way to avoid data loss is to set replication to be synchronous (it's async by default) so that slaves are always guaranteed to be in sync with the master, in case you need to promote (using Sentinel) a slave to master.