|
A question for HN: what filesystem and/or block-device abstraction layer would you use on a database server, if you wanted to perform scheduled incremental backups using filesystem-level consistent snapshotting and differential snapshot shipping to object storage, instead of using the DBMS’s own replication layer to achieve this effect? (I.e. you want disaster recovery, not high availability.) Or, to put that another way: what are AWS and GCP using in their SANs (EBS; GCE PD) that allows them to take on-demand incremental snapshots of SAN volumes, and then ship those snapshots away from the origin node into safer out-of-cluster replicated storage (e.g. object storage)? It it proprietary, or is it just several FOSS technologies glued together? My naive guess would be that the cloud hosts are either using ZFS volumes, or LVM LVs (which do have incremental snapshot capability, if the disk is created in a thin pool) under iSCSI. (Or they’re relying on whatever point-solution VMware et al sold them.) If you control the filesystem layer (i.e. you don’t need to be filesystem-agnostic), would Btrfs snapshots be better for this same use-case? |
Filesystem snapshots are a legitimate way of backing up databases, but it's not quite as simple as just taking a snapshot. For PostgreSQL for example you will still need to call pg_start_backup() and ensure your WAL archives are properly stored in your object storage system for point-in-time recovery. Without the database-specific precautions, your snapshots will still be crash-consistent and most likely usable in some manner, but not quite proper backups.
Using BTRFS or ZFS as the database filesystem has its own footguns. For example, the default record size of ZFS datasets doesn't match the block size of most databases, so if you forget to take that into account, you'll very likely see rather terrible performance.