I also think it's really interesting stuff and love working on it. I'll definitely write something about that if/when I do a blog post about it. But it's quite simple at its core. Like the current post mentions, we use Raft to do it. We simply have a cluster of 3 nodes, each in a different zone. If one zone goes down, there's still a majority of nodes up, so enough to keep the cluster running. I recommend reading the raft paper for more details, it's one of the easiest papers to read and understand I've ever found. https://raft.github.io/raft.pdf