Hacker News new | ask | show | jobs
by rembicilious 701 days ago
From the article: “If the user isn’t actively working on the database, PostgreSQL won’t generate a WAL file each minute. WAL files are by default 16 MB and PostgreSQL waits for the 16 MB to fill up. So, your restore granularity could be much longer, and you may not be able to restore to a particular minute in time.

You can overcome this problem by setting the archive_timeout and forcing Postgres to generate a new WAL file every minute. With this configuration, Postgres would create a new WAL file when it hits the 1-minute or 16 MB threshold, whichever comes first.

The second issue with backup/restore is “no activity”. In this case, PostgreSQL wouldn’t create a new file even if the archive_timeout is set. As a solution, you can generate artificial write activity by calling pg_current_xact_id().”

Can you explain why to create a WAL file even though there is no activity?

3 comments

We want to ensure that the users are able to restore to any minute in the past.

Then, as @pintxo, @bjornsing, and @Doxin asked below; Why not let users pick a time and then replay the closest backup? Why create empty backups? This is a valid question.

The answer is that without an empty backup, it's not possible to confirm there was no write activity. Maybe some write activity occurred but you fail to archive the WAL file. You need at least one WAL file (empty or not) with a timestamp equal to or more recent than the target restore time to ensure there were no write activity.

I didn't explain this in detail in the blog post for the sake of brevity, but you don't need to create an empty WAL file for each minute with no write activity. For instance, if there was no write activity for 10 minutes and then a WAL file is generated, PostgreSQL can deduce there was no write activity during those 10 minutes and is able restore to any minute within that period. However, if the user wants to restore to 5 minutes ago and there has been no write activity for the past 10 minutes, PostgreSQL doesn't have a WAL file to confirm lack of write activity. In this case, you need to call pg_current_xact_id() to generate a WAL file. So the trick is, when you get a restore request, call pg_current_xact_id() only once, just in case if there were no write activity.

This is a good example of difference between running PostgreSQL for yourself vs. others. When running PostgreSQL for yourself, such edge cases is not a concern. However, when managing it for others, an edge case like this could prevent a restore and create an operational burden. Automating this process is required especially if you are managing lots of PostgreSQL databases.

Thank you, that makes a lot of sense. I should have intuited that, but I couldn’t quite make the connection. I really appreciate the write up, it gave me a lot to think about as someone who has only ever managed my own DB.
"Point in time restore" is a bad way to call the feature if you don't let your customers pick a moment in time to restore to, so those tricks ensure that there's enough WAL entries to allow people to pick with per-second granularity.
One can let users pick a time and then just replay the closest backup? Why create empty backups?
> those tricks ensure that there's enough WAL entries to allow people to pick with per-second granularity

How can they ensure that? Sounds more like they ensure that people can pick point in time with minute granularity.

I’m no expert, but doesn’t postgres write a timestamp with each record in the WAL? I know that when you start postgres in recovery mode it lets you specify a timestamp you want to restore to, and I don’t think that timestamp needs to be on a WAL file boundary. (So the tricks seem weird / unnecessary to me.)

But if there has been no activity you surely can just pick the most recent log that's older than the time the user picked?
How do you know the WAL was not lost during transmission, or the server crashed before transferring the WAL even though there were writes after the last WAL you have in your backup?

That's why.

The point is to create a WAL file if there is a little activity, but not enough to fill 16MB.