| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AznHisoka 4615 days ago

So this is a technical problem I am having right now that's preventing me from backing up a Postgres database completely (hope someone here can help).

I have a master Postgres database that is receiving a TON of transactions per second (I'm talking about thousand concurrent transactions). We tried running pg_dump on this database, but the DB is just too huge, and it took more than 4 days to completely dump out everything. Not only that but it impacted performance to the point where backing it up was just not feasible.

No problem.. just create a slave-DB and run pg_dump on that, right? We did just that, but the problem is that you can't run long running queries on a hot standby (queries that take more than a minute).

What would you do in my scenario? With the hot standby, I technically am backing up my data, but I would have 100% piece of mind if I could daily backups in case someone accidentally ran a "DROP DATABASE X", which would also delete the hot standby/slave db as well.

8 comments

wiredfool 4615 days ago

There's a setting in postgresql.conf that will let you up the limit on long running queries on the standby from 30 seconds to ~ unlimited.

http://www.postgresql.org/docs/9.0/static/runtime-config-wal... See max_standby_archive_delay and max_standby_streaming_delay, -1 lets them wait forever.

Alternately, you can issue pg_start_backup('label'), backup the filesystem, then issue pg_stop_backup() and keep all the WAL logs from that time. That'll get you a base backup similar to the slave.

What I'm doing is this:

I've got a primary/hot spare pair, and a tertiary db on lesser equipment that's my second copy for cases where I have one of the main machines down or I have to rebuild the secondary from the primary.

The tertiary db ships logs to s3, after gpging them. Every $timeframe, I take a base backup and throw it up as well. I keep a couple, and delete the older ones. Every few months, I test a restore on ec2. There's a balance between the WAL logs that you need to keep, the time to restore, and the frequency of base backups.

[edit - parameter names. Further edit - strategy.]

jtchang 4615 days ago

I use postgres and ran into this issue as well.

Inside postgres.conf for the slave I have the following:

# These settings are ignored on a master server.

hot_standby = on # "on" allows queries during recovery # (change requires restart) max_standby_archive_delay = 900s # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay max_standby_streaming_delay = 900s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay #wal_receiver_status_interval = 10s # send replies at least this often # 0 disables #hot_standby_feedback = off # send info from standby to prevent # query conflicts

So I set it to 15 minutes for this specific backup server which I am okay with. I already have another server with much shorter time delays.

AznHisoka 4615 days ago

So you basically sacrifice speed of replication in order to ensure long running queries don't get cancelled?

wiredfool 4615 days ago

It's sacrificing the expected latency of replication.

Incidentally, if you're on 9.3 and your HW can handle it, take a look at parallelizing the pg_dump. If you've got a relatively fast disk subsystem and many cores, you can get a speedup. I've found it tends to make the dumps O(biggest table) instead of O(sum of all tables).

(It's native on 9.3, I've hacked up some scripts that do it for 9.0, but they don't get a consistent snapshot, so I do it during scheduled downtime. OTOH, the dump/restore is ~6x faster OMM/OMD, so the downtime is that much shorter)

sillysaurus2 4615 days ago

Was the database designed using transactions to achieve consistency? If so, then you can just instruct Tarsnap to back up the folder containing your database every day, and you're done.

If the DB uses transactions for consistency, you can copy it at any time without any problems.

wiredfool 4615 days ago

As long as you issue the pg_start_backup/pg_stop_backup pair and keep the WAL logs. If you don't, then you've got a corrupt backup.

At least you would catch that problem in your first test restore.

sillysaurus2 4615 days ago

... what? The point of consistency is that if the power to your server is cut, then you can reboot and pick up precisely where you left off. That means the database on disk must have consistency. Meaning you should be able to copy it at any point in time without any problems. If you can't, then that's not consistency, and if postgres really works that way, then it's failing one of the basic tenants of being a database. http://en.wikipedia.org/wiki/ACID

Any database that purports to have consistency must be able to withstand cutting the power to the server at any time. And if it can do that, then it must be true that you can copy the database folder at any time, too, without any special commands. (pg_start_backup is not issued before every power loss, so why would it need to be issued before a copy?)

On the other hand, if postgres doesn't support consistency, then that'd be a major reason not to use it.

EDIT: I'd run the server in a VM and backup VM snapshots. VMware makes this painless (and the snapshotting process is designed to have minimal impact disk I/O performance for precisely the scenario the OP described). VirtualBox probably has something similar. These replies seem crazily overcomplicated in comparison.

wiredfool 4615 days ago

Unless tarsnap does something like LVM snapshotting, then it's not going to get a consistent snapshot. You can't just copy the directory of an active server.

See: http://www.postgresql.org/docs/9.0/static/continuous-archivi... section 24.3.2. Making a Base Backup.

IF you have something like LVM of ZFS doing snapshots, then you can just tar the data directory.

wiredfool 4615 days ago

Re: VMs

Leaving aside the management issues of huge vm images and the less than ideal io performance, the ACID guarantees of pg rely on the underlying hardware obeying some specific restrictions, including real fsync and not lying about when things are on permanent storage. Getting the drives and raid controllers to obey that has historically been a difficult, ongoing job that has to be redone with each new generation of hardware. SSDs have been particularly interesting with that, the actual flush to disk can be quite delayed from the logical write. Some have supercaps, some don't. Those that don't are vulnerable to power losses while the data is still in the drive's ram awaiting a block erase and write. The IDE drivers used to flat out lie. Enterprise SAS drives often come with the write caching turned on (since it looks better in benchmarks) even though they're often times used behind a battery backed raid controller.

Adding a VM layer to that just to get snapshots seems overly complicated and prone to issues.

nierman 4615 days ago

If you're taking an instantaneous snapshot of the system then yes. A standard copy/rsync/etc. isn't going to give you that. If the copy takes a long time at what point do you grab the pg_xlog directory? and are all the files there that you need/ed?

alexkus 4615 days ago

ACID doesn't apply since you can't copy a large file in an instant. The copy takes time, in which time the files on disk can change. This isn't the same thing as the server losing power.

VM snapshots, zfs snapshots, etc are the way to go.

emeraldd 4615 days ago

There's something that works and there's the right way to do it. It's better to do things the right way if you want to make sure everything is in a good state when you bring it back and there aren't edge cases you missed ... What if someone forgot to use a transaction?

lucian1900 4615 days ago

The WAL is used for recovering from power loss. You need both the db files and your WAL to get a backup.

AznHisoka 4615 days ago

Does this interfere with the replication process at all if I run pg_start and pg_stop_backup, and rsync the files to another server?

wiredfool 4615 days ago

Nope. I do it all the time.

The start/stop backup has to be issued on the master. It doesn't look like the standby gets the backup label (at least on 9.0, may have changed since). So you'd have to be reading from the master's data directory.

Alternately, you could stop the secondary and pull from there. But that interrupts the replication, and then the secondary would have to catch up, which might be hard depending on your level of usage.

feel free to email or chat my un on freenode.

viraptor 4615 days ago

What filesystem are you running on? Can you snapshot it outside of the postgres environment? The database may be mid-transaction at that point, but it's still better if it does log replay at startup, than losing all the data.

Also if your filesystem snapshots can be exposed as files / block devs, you can rsync them to another host lowering the amount of transferred data (keep the previous copy so rsync will only copy the blocks that differ).

khaless 4615 days ago

Just a thought... If your storage layer has support for taking a consistent snapshot of your file system then you might be able to use this to get a backup.

You would get a copy of your database that you would need to run log-replay recovery on but after that it should be all good.

mikeocool 4615 days ago

pg_dump is a logical backup, that is, as you've seen, it queries the all of the data in your database in writes to a file in the form of queries that will re-create all of your data in a new database. This great and very flexible, but as you've seen has some limitations.

You probably want to look into physical backups, where you basically copy the actual files that postgres is actually using to store your databases on disk (although it's not quite that simple, so do some googling on it). This has the nice advantage of not requiring you to run queries against your database to back it up. It also gives you a consistent point-in-time backup of your database.

FooBarWidget 4615 days ago

Sounds to me like you have no choice but to shard your database in order to reduce the write load on a single database.

Or upgrade your database servers' hardware (more RAM, faster CPU, faster SSD) to the point where you can dump the database.

jtchang 4615 days ago

By the way if you need help feel free to e-mail/ping me.