Hacker News new | ask | show | jobs
by bmurphy 5378 days ago
Very nice, thank you. Mostly all of this is good advice and correlates nicely with my experience.

One thing though:

"EBS volumes and Software RAID is best but scary on AWS"

I've managed an EBS RAID10 database for a few years now. I wouldn't touch this with a 10 foot pole.

Do yourself a favor, set up an m1.xlarge (or bigger) instance, put the ephemeral drives in a RAID0 and mirror across multiple machines using hot-standby, slony, londiste, or some other tool. You'll be much happier, your system will perform much better, and you'll have a failover strategy in place.

2 comments

I dont understand this - is there a difference between an explicitly allocated EBS volume and the ephemeral volumes of an EBS-backed instance ?

Or are you focusing on the RAID10 part - but then everywhere RAID10 is touted to be the best RAID solution (right balance between performance and safety)

EBS is a shared networked SAN. The performance characteristics of it are not that great and even worse, highly variable. The last thing you want to be running your database on is a system where the performance characteristics vary greatly throughout the day and you have no control over it.

The ephemeral drives are drives directly attached to the server and to the best of my knowledge are not a shared resource. Their performance characteristics are highly consistent, but if your server goes down all data on those drivers are lost.

EBS sounds nice in theory, but by going to EBS RAID you throw away most of its benefits (such as snapshotting) and take on it's worst aspects.

Ephemeral volumes are shared, just not nearly as variable.
Yes, there is a difference between EBS volumes and ephemeral volumes of an EBS-backed instance.
And what happens when the AZ goes down? Don't you lose data then? Or do you mean multiple machines across AZ/region?
You'll want the mirror in a different AZ. Or you could also put it in a different region.

Use Wal-E to push wal logs to s3.

Absolutely. You should always be spreading your data across multiple availability zones and where feasible across multiple data centers and S3 is a great place to store your wal logs. We do the same thing.