Hacker News new | ask | show | jobs
by samokhvalov 2322 days ago
Hi! Postgres.ai founder here.

This is requested quite often – for example, if we copy the database from production, sometimes it's needed to remove all personal data not to break regulations.

It is possible in Database Lab, but it's not a very user-friendly feature yet. Briefly, the process is as follows.

The "sync" Postgres instance is configured to be a production replica (better using WAL shipping from the WAL archive). Then, periodically, a new snapshot is created, currently it's done using this Bash script: https://gitlab.com/postgres-ai/database-lab/-/blob/master/sc.... (We are going to make it a part of the database-lab server in the upcoming releases).

Here https://gitlab.com/postgres-ai/database-lab/-/blob/master/sc... you can place any data transformations, so the final snapshot that will be used for thin cloning has adjusted data sets. For example, all personal data is removed or obfuscated.

Of course, if you do this, you need to keep in mind that physically, you'll have a different database. It may affect some kinds of testing (for example, troubleshooting bloat issues or some cases of index performance degradation). There are various choices to be made here. If interested, we'll be happy to help, please join our community Slack which is mentioned in the docs and README.