Hacker News new | ask | show | jobs
by pawelduda 1204 days ago
I see this doesn't yet support something I've wanted for some time:

What's the simplest way to snapshot a database, perform an operation in app, snapshot it again and get an overview of how many rows were affected in each table? Like an overview of diff. Optionally list changed pkey ids. I tried googling but couldn't find anything like this, so I was thinking of making a DYI solution but wasn't that desperate yet.

5 comments

I'd guess that https://neon.tech/ could help with this. Neon is branded as "Serverless Postgres". They have APIs to create branches of your database.

So you could effectively:

1) Create a snapshot of your production DB -> DB_2A

2) Then create a snapshot of that snapshot -> DB_2B

3) Now you have two copies of the exact same database. Run your query/workload/migration on DB_2B.

4) Run some metadata queries against DB_2A and DB_2B and compare the results.

5) If your metadata queries are inline with expectations, delete the snapshots. If not, leave them around for a bit for manual inspection.

Interesting. Will check it out in detail. From a quick glance tho I'm afraid that vendor-specific cloud is a no-go because I would rather not upload any client's data in there, even if anonymized. Besides, it's an overkill to convince anyone to switch to a different vendor for sake of diffing DB alone :)
We were looking at tooling for snapshotting production data for testing purposes, that use case is interesting to have the diff view based on changes happening from application actions. Would you use it for testing changes or more for debugging production issues?
Well, my main use case would be to speed up project onboarding where I can play around with the app from user perspective and check how my actions impact database... Could help cutting through a lot of frontend/backend layers and just focus on raw data.

If you have multiple DB snapshots and WAL enabled, theoretically you could inspect the log to see what happened inbetween. There's pg_xlogdump for that but I think it will output very raw data...

How would you want to actually look at the database diff? Just a summary view of rows that were changed given a certain time period?
Dolthub does that https://www.dolthub.com/
Why not use aws rds snapshots to create a new replica in no-time?
I wonder if you could utilize templates to snapshot the DB.
This is what tools like `dslr` do and it's very fast. My problem wasn't backing up/restoring though but rather diffing data in these 2 databases :)
It seems like it should be doable to write something that generically diffs tables.