| Backups? Do you want to share your idea about how you'd do backups? Especially to a distributed database? Here are some of the questions you'll have to answer and some options you will have to consider before you go there: Let's start with the heavy stuff: consistency groups. I.e. groups of bulk storage that underlines your entire infrastructure that ensure that your application and database(s) all recover to the shared state once they crash. To better explain this concept, consider this: you have an application that works with two databases, let's say a document database to store documents uploaded by users (which are later parsed by the application and transformed into records in a relational database). Now, each database provides best consistency guarantees... but they still can fail independently and subsequently recover to different state, where, for example, the document database can be ahead of the relational one (and lose some data). Similar problems face sharded databases. How geographically far are you going to send your backups? You see, the closer to the working server they are, the higher is the chance you'll lose them together. But, here's the problem: the further away the backups are, the lower is your ability to keep the backup up-to-date with the database, and, subsequently, more data to lose. Well, backups inherently lose data (for the time between the last backup and the time of the crash). So, if you don't want to lose data at all, you probably want replication rather than backups. And you probably want online replication (but then the distance between the replicas is even more important than in the case with backups). Also, backups are huge. If you want to ship them outside of the facilities of the storage vendor... that's going to be expensive. Another point to consider: databases provide consistency guarantees, but does your database provide consistency guarantees you want? Is every relation encoded by using foreign keys, or does the application have some knowledge of how to interpret pieces of data and stitch them together into relationships unknown to your database? Are you sure that every operation that requires atomicity is implemented in a database rather than application (which doesn't enforce atomicity)? What if you stick a backup (recovery point) in a precise moment when your application was doing something that was meant to be atomic, but the application author didn't know how to express in SQL (because in their fear of technology they chose to use Hybernate or SQLAlchemy etc.)? And if you do so, it spoils your backup... |
However, we are talking about Postgres, here, not a generic database. PostgreSQL natively provides continuous backup, streaming replication, including synchronous (controlled at transaction level), cascading, and logical. You can easily implement with Postgres, even in Kubernetes with CloudNativePG, architectures with RPO=0 (yes, zero data loss) and low RTO in the same Kubernetes cluster (normally a region), and RPO <= 5 minutes with low RTO across regions. Out of the box, with CloudNativePG, through replica clusters.
We are also now launching native declarative support for Kubernetes Volume Snapshot API in CloudNativePG with the possibility to use incremental/differential backup and recovery to reduce RTO in case of very large databases recovery (like ... dozens of seconds to restore 500GB databases).
So maybe it is time to reconsider some assumptions.