Hacker News new | ask | show | jobs
by makmanalp 2316 days ago
I'd ask the opposite question - at what scale would you want to have your own custom setup rather than RDS? Managing your own database infrastructure for workloads other than "a few queries a second" is hard work with a lot of pitfalls, and you better be at a size that there's some benefit (high levels of customization, use case specific tuning, economies of scale, etc). As a person who does exactly this for a living, I'd rather shell out for RDS or a similar offering than my own setup most of the time. Especially at first, before you discover what exactly you /don't/ like about it or what you'd want different.
2 comments

Is it hard work though? In a couple hours you should be able to setup automatic backups and practice going through the recover process a couple times. That's all there is for most small-business setups, but if you are daring you can now do whatever you want with the config file, install extensions, setup basic system monitoring (CPU/Ram usage, disk usage, etc.). GCP/Digital Ocean let you look at node resource usage automatically, and since Postgres is probably the only process it means you don't even need to set that up!
> In a couple hours you should be able to setup automatic backups and practice going through the recover process a couple times.

Unfortunately there's a lot more too it than that. You need to handle when the backup job fails or dies, have a process for deleting old backups, etc. Not just that, but if you have multiple Postgres instances, you need to do this work for each machine. I've seen first hand this kind of stuff become a huge distraction. It's often worth it to pay AWS a bit more in exchange to not worry about this stuff.

> Unfortunately there's a lot more too it than that.

Is there though? Consider what I would argue to be the "average" case:

* Your database never exceeds > 40% resource usage

* You service fewer then 1m queries/day

* You never burst more then 1k queries/minute

* You have a script tied to a cronjob that backs up the database, with basic error handling that sends you a Slack DM if it fails

* You have a script tied to a cronjob which deletes old backups, with basic error handling that sends you a Slack DM if it fails

What percentage of companies need more then that?

> * Your database never exceeds > 40% resource usage > * You service fewer then 1m queries/day > * You never burst more then 1k queries/minute

How do I know it doesn't exceed 40% usage? Better yet, who's holding the pager when it does? If/when it does, who's product launch is dead in the water while the db is reconfigured onto a larger instance? What product isn't being delivered because we're faffing about with the database instead of product code?

> * You have a script tied to a cronjob that backs up the database, with basic error handling that sends you a Slack DM if it fails > * You have a script tied to a cronjob which deletes old backups, with basic error handling that sends you a Slack DM if it fails

Who's responsible for restoring from backup every week/month/quarter, to assert they actually work, with whatever changes have been made recently? Untested backups are Shrodingers backups.

Just how well tested is this script? Does it properly error out if the script fails to be run? What if a firewall rule accidentally gets set that blocks egress from the backup box to the Internet (for security); who/how/what gets notified instead? Who's deliverables are slipping because the backups randomly stopped working?

> What percentage of companies need more then that?

That's a fair question, but Amazon's done far more research than I, possibly you on that topic. The real question is, of companies that don't need more than that, how many companies want to hire somebody to take on those responsibilities part-time? How many companies have the expertise to even hire somebody qualified to do that part-time? And since those people are managing the DB part time, how many of them are giving it the attention it needs, and aren't distracted by other responsibilities to the company?

None of those problems are insurmountable, but they're far from most business' core competency, and time I'm spending dealing with postgresql.conf (or my.cnf) is time I'm not dealing with other issues. Don't get me wrong, there's still a time and place for managing database instances, but IMO small business (small > tiny) aren't the appropriate place for that. I'd be interested in hearing if someone's run the numbers to justify it though! (Especially if it falls in favor of running it yourself.)

Any that can't afford more than a couple minutes of downtime when a server fails.
That's definitely not an "average" company. It's also a really small number of companies that really can't afford that, rather than "earn less money than usual".
I used both too and I think it really depends if you want to pay for it. For small to medium independent projects, I think a EC2 instance that sometimes cuts past that free tier usage is fine, otherwise RDS can be overkill and can seriously eat into costs. I would say a beginner doing independent projects should strongly consider EC2 instances instead of RDS.