Hacker News new | ask | show | jobs
by malisper 2316 days ago
> In a couple hours you should be able to setup automatic backups and practice going through the recover process a couple times.

Unfortunately there's a lot more too it than that. You need to handle when the backup job fails or dies, have a process for deleting old backups, etc. Not just that, but if you have multiple Postgres instances, you need to do this work for each machine. I've seen first hand this kind of stuff become a huge distraction. It's often worth it to pay AWS a bit more in exchange to not worry about this stuff.

1 comments

> Unfortunately there's a lot more too it than that.

Is there though? Consider what I would argue to be the "average" case:

* Your database never exceeds > 40% resource usage

* You service fewer then 1m queries/day

* You never burst more then 1k queries/minute

* You have a script tied to a cronjob that backs up the database, with basic error handling that sends you a Slack DM if it fails

* You have a script tied to a cronjob which deletes old backups, with basic error handling that sends you a Slack DM if it fails

What percentage of companies need more then that?

> * Your database never exceeds > 40% resource usage > * You service fewer then 1m queries/day > * You never burst more then 1k queries/minute

How do I know it doesn't exceed 40% usage? Better yet, who's holding the pager when it does? If/when it does, who's product launch is dead in the water while the db is reconfigured onto a larger instance? What product isn't being delivered because we're faffing about with the database instead of product code?

> * You have a script tied to a cronjob that backs up the database, with basic error handling that sends you a Slack DM if it fails > * You have a script tied to a cronjob which deletes old backups, with basic error handling that sends you a Slack DM if it fails

Who's responsible for restoring from backup every week/month/quarter, to assert they actually work, with whatever changes have been made recently? Untested backups are Shrodingers backups.

Just how well tested is this script? Does it properly error out if the script fails to be run? What if a firewall rule accidentally gets set that blocks egress from the backup box to the Internet (for security); who/how/what gets notified instead? Who's deliverables are slipping because the backups randomly stopped working?

> What percentage of companies need more then that?

That's a fair question, but Amazon's done far more research than I, possibly you on that topic. The real question is, of companies that don't need more than that, how many companies want to hire somebody to take on those responsibilities part-time? How many companies have the expertise to even hire somebody qualified to do that part-time? And since those people are managing the DB part time, how many of them are giving it the attention it needs, and aren't distracted by other responsibilities to the company?

None of those problems are insurmountable, but they're far from most business' core competency, and time I'm spending dealing with postgresql.conf (or my.cnf) is time I'm not dealing with other issues. Don't get me wrong, there's still a time and place for managing database instances, but IMO small business (small > tiny) aren't the appropriate place for that. I'd be interested in hearing if someone's run the numbers to justify it though! (Especially if it falls in favor of running it yourself.)

Any that can't afford more than a couple minutes of downtime when a server fails.
That's definitely not an "average" company. It's also a really small number of companies that really can't afford that, rather than "earn less money than usual".