Hacker News new | ask | show | jobs
by samokhvalov 2321 days ago
Well, not really. Two key differences:

1) dump/restore approach is one of "thick cloning" methods. It will take a really significant amount of time. Roughly, thick cloning speed for modern hardware and networks is somewhat ~1 TiB / hour just for transferring over network and writing to disk. Additionally, in the case of dump/restore, the timing will depend on the number of indexes that need to be built, materialized views to be refreshed, etc.

2) Dump/restore is "logical copying", it means that physical structure is not preserved. Logically, you'll get the same data. Physically – a completely different database. Bloat is gone, indexes are "fresh and good", and so on.

With Database Lab, unless you're on RDS, initial copying is preferably done using some "physical" methods (pg_basebackup, or just rsync, or – the best option – restoration from WAL-G/pgBackRest/Barman/etc backups). The physical structure is preserved. You have an identical copy of the original database. Then you request "thin clones", it really takes a couple of seconds for a multi-terabyte database. And can do various kinds of performance troubleshooting and experiments before you deploy your changes to production. Request 5 thin clones and check your idea 5 times, each time adjusting it a little bit. This is what usually really needed in development.

2 comments

Ahh okay, that makes sense. I wasn't aware that using different methods of copying + backing up data had tangible effects on it later. Super interesting, and vital if you're looking to gather accurate data from experiments.

By the way, I recognized you guys from the other tool you publish, Nancy. While I don't have a direct usecase for it myself, I thought the idea was super valuable as well.

I am a huge Postgres guy (did you know you can build an entire web app in JUST Postgres?[0]) so I had bookmarked a couple of your projects.

Thank you for the response, you guys do some fantastic work.

[0]: https://github.com/aquametalabs/aquameta

Thank you! I'll check out aquameta, looks interesting.
Can you explain what a thin clone is?
Sure. Thin cloning is a way to obtain a clone of data that looks fully independent (you can modify, add, delete data), but physically multiple clones share physically the same data -- except those blocks that were changed. This is implemented using copy-on-write -- see, for example, https://en.wikipedia.org/wiki/Copy-on-write#In_computer_stor...