Hacker News new | ask | show | jobs
by mdasen 1292 days ago
Unlike your code which you can redeploy after a bit of downtime, you might not be able to un-f^ck your database. I think that's ultimately the selling point. No one wants to be responsible for keeping the data safe when it's not their job. Do you work at a company that is going to applaud you for testing your backups? If not, you're wasting your time doing that. Do you work at a company that is going to promote you for getting high-availability right before an outage happens?

Some people certainly do work for companies like that. I'm sure big places like Facebook or Apple or Netflix applaud and promote people for this - and have the scale at which having people working on these problems makes sense. At your startup with a few dozen people? Probably not. If you're not building the product, you're not helping the company succeed. Ok, you're saving the company a bit of money, but at the cost of your time and the cost of the company actually getting product out the door, finding product-market-fit, etc.

Do you want to use your employee time setting up a HA database cluster and saving the company $1,000 per month or developing your app?

That's a key question: pay Linode $1,560/mo for a 3-node 32GB RAM cluster or launch 3 32GB boxes for $720, figure out PostgreSQL replication, make sure you setup the replication users, make sure you don't open any security holes, make sure you have Patroni or Stolon so that you can switch over when the primary fails, make sure you have etcd or Consul or something to handle that coordination, make sure that you have Barman or pgBackRest setup to take your WAL and persist it to S3, setup your S3 buckets, setup a full backup schedule so that you can restore easily, make sure that you're regularly testing your restores, make sure that you're testing your failover (do you even know if Patroni is actually working?), figure out how your app is going to cut over to the new primary when that happens (is there a shared IP that you need to move, are you using a proxy like HAProxy that's checking health-checks to see which it should proxy to, etc). Or would you rather just pay someone $1,000/mo so that you don't have to deal with that?

I hate paying up for something I feel like I should be able to do myself, but it does make some sense. If I decide I don't want to pay for Google Cloud Run and my servers all die, I can boot up some new boxes and get my app running again with some downtime. That's not great, but recoverable. If I don't want to pay for Google Cloud SQL and my servers die, now I'm hoping that my backups were working, that I can bring a much more complicated deployment back online than just some random process or container, etc. One of those two just carries more risk. Yes, backups should work and should be tested and you should even test backups in a managed service, but if you're a startup trying to move fast and find product market fit, I'm guessing that the premium is worth saving your engineers that time. I hate saying that because cloud providers are pushing such high margins, but it's probably true.

As a curiosity, do you run your own databases? If so, which? Do you find that everything Just Works or that it's a pain to get everything running, debugging things, testing backups, etc.? Is this in a high-traffic, commercial situation or just as a hobby? I think hosting your own database is relatively easy if you're just going to have one server and pg_dump -> rsync a backup nightly. In the rare event that you lose a server, maybe have an hour of downtime. If you're able to recover within 50 minutes and you lose a server every week, you're still at 99.5% uptime. If you can recover in 40 minutes and you lose a server every month, you're at 99.9% uptime. Do we need more? How often does a VM go down (note, don't say "I have 1,000 servers and I lose one every other day" - that would mean the average instance is lasting several years)? Won't Google's live-migration of VMs handle a lot of that?

So, it's trade-offs, but I think we're not living in a world where companies say "we're not going to architect for HA and we'll suffer an hour or two of downtime every year or two when we lose a box." Sometimes we certainly get ourselves into situations where we've made complicated systems that end up having complicated failure scenarios too.

Still, I think managed databases are an easy sell, even at their premium price point.

4 comments

> That's a key question: pay Linode $1,560/mo for a 3-node 32GB RAM cluster or launch 3 32GB boxes for $720, figure out PostgreSQL replication, make sure you setup the replication users, make sure you don't open any security holes, make sure you have Patroni or Stolon so that you can switch over when the primary fails, make sure you have etcd or Consul or something to handle that coordination, make sure that you have Barman or pgBackRest setup to take your WAL and persist it to S3, setup your S3 buckets, setup a full backup schedule so that you can restore easily, make sure that you're regularly testing your restores, make sure that you're testing your failover (do you even know if Patroni is actually working?), figure out how your app is going to cut over to the new primary when that happens (is there a shared IP that you need to move, are you using a proxy like HAProxy that's checking health-checks to see which it should proxy to, etc).

Little bit off topic: It's funny that all the time you see comments on HN saying something along the line: why using Kubernetes, it's just overly complex. But after that are, most likely, returning to their self managed DB clusters that need all that maintenance and setup just listed. While, if they were using Kubernetes, they would have Operators or Helm Charts that do 90% of the stuff.

(Now back to topic) Don't get me wrong, I'll chose a managed DB all day every day over something self managed but sometimes (e.g. startup without a indefinitely amount of VC money) self management of infrastructure is the fastest and cheapest way to go.

> Still, I think managed databases are an easy sell, even at their premium price point.

Except Fly Postgres isn't a managed offering unlike, say, CrunchyBridge or PlanetScale or Alloy or Aurora. It is pretty much a "Fly app" you'd have to tend to yourself.

But fly.io also isn’t charging a premium over the base infra cost (are they? I don’t see fly Postgres on their pricing page)
Fly Postgres is just a Fly app. It's open source; you can grab it off Github and deploy it yourself, we'll never know. No, we're not upcharging for Postgres.
The way I like to think of this is risks.

Different companies or even teams within different companies will have different risk acceptance. The thing with managed services, is part of the premium is your getting all the bells and whistles... but that may not be aligned with what customers need, that paying the managed premium is buying them.

So I suspect the important thing here is that customers realize what they're getting and have proper expectations set. In this case, unless I'm missing something, that you're not getting a managed database service from fly.io, you're getting an OSS tool that makes running postgres on fly.io a bit easier. Kind of like a database controller for kubernetes... helps you automate some things, but it's still just software running on your cluster.

And then it's up to those customers to decide, whether that's acceptable risk to them or not. Maybe a bunch of customers get to vote with their wallets on whether this works for them or not. Maybe there will be enough demand where some partner specializing in database tech like neon, crunchy, cockroach, etc will have a service targeting fly.io specifically, or maybe fly.io will get stuck building it themselves if customers demand it.

Lots of maybes, so at least I'll be interested to follow this and see how it develops.

Not only that, you have to figure out how to fix the thing if the HA contraption breaks.

There's also security to think about (do you run a CA? how do you handle Postgres certificates, etcd certificates, rotation, revocation, etc).

Some hosted providers also have nice value-adds like query-level performance monitoring and DBA services