Hacker News new | ask | show | jobs
by Reitet00 1515 days ago
> - fork: ever had one of those "why does this bug only exist in production?" problems? It was so trivial to fork the DB and run your tests/hypothesis/whatever without the risk of actually impacting production. Same thing for _really_ testing a migration script or load test.

This sounds great! How does it work though? Is it using some special postgres feature or btrfs snapshots or something else completely?

2 comments

Craig (the poster I jumped in to reply to) would know the specifics better than I ever did. My recollection is:

- restore from the latest snapshot (there was one whether you’d configured a custom backup schedule or not)

- replay the write ahead log over the top to catch the restore up to the point in time you asked for/when you ran the command. At least some part of this process leveraged WAL-E, which was a tool largely developed by Heroku employees and open sourced.

This was a decade or more ago though. The state of the art of postgres has moved on and I assume the team would tackle it differently if they were doing it today.

It's leveraging pretty native Postgres tooling that restores the base backup from within Postgres, then replays the WAL to the exact point and time you specify. With snapshots and other mechanisms you may get a database "up" sooner, but we've seen when we follow that approach it's so long for the PG cache to warm up that you effectively still have a useless database even though it's "up". Further Postgres itself depending on how you do it will have to go through crash recovery, which I've seen cases on some providers taking over 10 hours.

Doing the native approach in Postgres isn't perfect, but we've focused on getting the developer experience for it down so you can use your database and it "just work" and if something goes wrong you understand how to rollback seamlessly.

Very cool! Thanks for taking the time to describe this in detail Craig (and Glenn).