Hacker News new | ask | show | jobs
by outworlder 1976 days ago
> As I mentioned earlier we run Postgres on i3.8xlarge instances in EC2, which come with about 7.6TB of NVMe storage.

Wait a second. You run your production database on ephemeral storage? Wow.

I see the replication setup and the S3 WAL archiving and whatnot but still... that's brave.

4 comments

It's really a question of how many replicas you have, if you're running with sync rep or not, and what your DR story is like. We've tested it before a few times at previous employers and are exploring it rolling it out for Crunchy Bridge currently. The NVMe storage is really nice it's great performance and the price balance of it is good as well. But it does come with nuances... I wouldn't let a user provision without HA for example. In cases for a standard app without crazy uptime requirements having a standby or 2 is wasted cash. So it isn't for everyone, but can be for some people.
hi! Can you share how you do HA on postgres? Master/slave with monitoring and manual fall over or is that automatic? If so reliable? What tooling do you use? Thanks!
We are living life on the edge to an extent, but we have 5 hot standbys across AZs and regular backups + WAL archives to S3.

May not be as durable as EBS, but it's enough for me to sleep soundly at night. And with a highly concurrent WAL-G download, it takes like an hour to catch up a new replica from scratch.

Fine, with enough replicas, you can sleep well at night. But how about the 3 years uptime without reboot? Can you really enjoy your morning coffee without thinking about it? :)

Netflix went full ephemeral storage for their Cassandra clusters since the beginning, at the time when they were just spinning disks. Years later, they still insist on doing this, and had to come up with creative solution to fix the uptime issue: https://netflixtechblog.medium.com/datastore-flash-upgrades-...

From the parent comment:

> it takes like an hour to catch up a new replica from scratch

That means it should be pretty easy to replace an instance - just create a new replica from scratch, then fail over (if you're replacing the current master/primary instance), and remove the old one.

Why can't you reboot? NVMe storage is local, not ephemeral.
Yes, you can reboot, but you have to update in-place, instead of rolling out a new OS image. Or you adopt the Netflix approach. There are also some additional restrictions, e.g. you can't change the machine type.

Anyway, from the other comment here, I think tommyzli might not have realized that a reboot is still possible, which would partially explain the 3 years uptime.

This was pretty common in AWS back in the late 00s. Performance usually sucked too much otherwise.
Even with prioritized IOPS I once had to resort to RAID0 and replicas to get needed performance under budget on EBS. Probably should have just bumped instance size and used local storage.
It's funny--we used to run Vertica on ephemeral nodes and actually found a performance improvement going to EBS, but that was pre-NVMe in AWS.

I wonder how big the delta was for CMB between EBS and ephemeral?