Hacker News new | ask | show | jobs
by glenngillen 1513 days ago
disclaimer: former Heroku person here too

Some examples of the things I've missed around developer experience for a database, that Craig and the team made possible at Heroku Postgres, include:

- fork: ever had one of those "why does this bug only exist in production?" problems? It was so trivial to fork the DB and run your tests/hypothesis/whatever without the risk of actually impacting production. Same thing for _really_ testing a migration script or load test.

- follow: a similarly easy approach for getting a read replica which is super useful for generating reporting.

- dataclips: "hey, can you tell me X?" sure, and here's a URL to the results that you can refresh if you need an updated number in the future. So great for adhoc queries.

All of these are obviously doable with RDS and/or other solutions too. But the time taken to do any of the above was often measured in seconds, at most minutes. It's difficult to communicate just how impactful those kind of improvements are to your workflow. It's like it subconsciously gives you permission to tackle whole new problems, build better solutions, get answers to questions you never thought to ask before. Because the barrier to entry is so low you just do these things. You don't sit around wondering if you could.

A great developer experience around a database (one that goes beyond setup and basic ops) is a severely under appreciated thing IMO.

1 comments

> - fork: ever had one of those "why does this bug only exist in production?" problems? It was so trivial to fork the DB and run your tests/hypothesis/whatever without the risk of actually impacting production. Same thing for _really_ testing a migration script or load test.

This sounds great! How does it work though? Is it using some special postgres feature or btrfs snapshots or something else completely?

Craig (the poster I jumped in to reply to) would know the specifics better than I ever did. My recollection is:

- restore from the latest snapshot (there was one whether you’d configured a custom backup schedule or not)

- replay the write ahead log over the top to catch the restore up to the point in time you asked for/when you ran the command. At least some part of this process leveraged WAL-E, which was a tool largely developed by Heroku employees and open sourced.

This was a decade or more ago though. The state of the art of postgres has moved on and I assume the team would tackle it differently if they were doing it today.

It's leveraging pretty native Postgres tooling that restores the base backup from within Postgres, then replays the WAL to the exact point and time you specify. With snapshots and other mechanisms you may get a database "up" sooner, but we've seen when we follow that approach it's so long for the PG cache to warm up that you effectively still have a useless database even though it's "up". Further Postgres itself depending on how you do it will have to go through crash recovery, which I've seen cases on some providers taking over 10 hours.

Doing the native approach in Postgres isn't perfect, but we've focused on getting the developer experience for it down so you can use your database and it "just work" and if something goes wrong you understand how to rollback seamlessly.

Very cool! Thanks for taking the time to describe this in detail Craig (and Glenn).