|
Unlike your code which you can redeploy after a bit of downtime, you might not be able to un-f^ck your database. I think that's ultimately the selling point. No one wants to be responsible for keeping the data safe when it's not their job. Do you work at a company that is going to applaud you for testing your backups? If not, you're wasting your time doing that. Do you work at a company that is going to promote you for getting high-availability right before an outage happens? Some people certainly do work for companies like that. I'm sure big places like Facebook or Apple or Netflix applaud and promote people for this - and have the scale at which having people working on these problems makes sense. At your startup with a few dozen people? Probably not. If you're not building the product, you're not helping the company succeed. Ok, you're saving the company a bit of money, but at the cost of your time and the cost of the company actually getting product out the door, finding product-market-fit, etc. Do you want to use your employee time setting up a HA database cluster and saving the company $1,000 per month or developing your app? That's a key question: pay Linode $1,560/mo for a 3-node 32GB RAM cluster or launch 3 32GB boxes for $720, figure out PostgreSQL replication, make sure you setup the replication users, make sure you don't open any security holes, make sure you have Patroni or Stolon so that you can switch over when the primary fails, make sure you have etcd or Consul or something to handle that coordination, make sure that you have Barman or pgBackRest setup to take your WAL and persist it to S3, setup your S3 buckets, setup a full backup schedule so that you can restore easily, make sure that you're regularly testing your restores, make sure that you're testing your failover (do you even know if Patroni is actually working?), figure out how your app is going to cut over to the new primary when that happens (is there a shared IP that you need to move, are you using a proxy like HAProxy that's checking health-checks to see which it should proxy to, etc). Or would you rather just pay someone $1,000/mo so that you don't have to deal with that? I hate paying up for something I feel like I should be able to do myself, but it does make some sense. If I decide I don't want to pay for Google Cloud Run and my servers all die, I can boot up some new boxes and get my app running again with some downtime. That's not great, but recoverable. If I don't want to pay for Google Cloud SQL and my servers die, now I'm hoping that my backups were working, that I can bring a much more complicated deployment back online than just some random process or container, etc. One of those two just carries more risk. Yes, backups should work and should be tested and you should even test backups in a managed service, but if you're a startup trying to move fast and find product market fit, I'm guessing that the premium is worth saving your engineers that time. I hate saying that because cloud providers are pushing such high margins, but it's probably true. As a curiosity, do you run your own databases? If so, which? Do you find that everything Just Works or that it's a pain to get everything running, debugging things, testing backups, etc.? Is this in a high-traffic, commercial situation or just as a hobby? I think hosting your own database is relatively easy if you're just going to have one server and pg_dump -> rsync a backup nightly. In the rare event that you lose a server, maybe have an hour of downtime. If you're able to recover within 50 minutes and you lose a server every week, you're still at 99.5% uptime. If you can recover in 40 minutes and you lose a server every month, you're at 99.9% uptime. Do we need more? How often does a VM go down (note, don't say "I have 1,000 servers and I lose one every other day" - that would mean the average instance is lasting several years)? Won't Google's live-migration of VMs handle a lot of that? So, it's trade-offs, but I think we're not living in a world where companies say "we're not going to architect for HA and we'll suffer an hour or two of downtime every year or two when we lose a box." Sometimes we certainly get ourselves into situations where we've made complicated systems that end up having complicated failure scenarios too. Still, I think managed databases are an easy sell, even at their premium price point. |
Little bit off topic: It's funny that all the time you see comments on HN saying something along the line: why using Kubernetes, it's just overly complex. But after that are, most likely, returning to their self managed DB clusters that need all that maintenance and setup just listed. While, if they were using Kubernetes, they would have Operators or Helm Charts that do 90% of the stuff.
(Now back to topic) Don't get me wrong, I'll chose a managed DB all day every day over something self managed but sometimes (e.g. startup without a indefinitely amount of VC money) self management of infrastructure is the fastest and cheapest way to go.