Hacker News new | ask | show | jobs
by cluda01 5091 days ago
I would very much appreciate it if you elaborate on why managing your own MySQL database is the best option. I'm currently moving from back end proprietary systems development to web development and would like to hear your considerations.
4 comments

The short answer is that we believe running your own instances gives better performance and reliability than RDS. However, the cost is complexity: I'm a relatively experienced DBA, and we've since hired a second person with deep MySQL experience. If MySQL admin is something you don't want to spend much time doing, you may be willing to make the performance sacrifice of RDS.

The longer answer is that we don't use RDS because it relies on EBS, and we do not trust EBS for any critical applications. Instead, we put our data on instance storage (aka "ephemeral" storage).

This has two big disadvantages:

a) portability: you can't detach the drive and move it to a new instance like you can with EBS -- to clone or backup, you have to copy over the network, which is much slower (and obviously, if you kill the instance, you lose the data).

b) storage: you are limited in how big your DB can be. An AWS large instance these days gives you nearly 1TB of instance storage, but if you have a single DB larger than that, you need to use EBS if you're on Amazon. (Of course, if you care about performance and your database is > 1TB, you should probably be looking at sharding across multiple machines anyway)

However, using instance storage has two big advantages that we think outweigh those:

a) performance. EBS is basically a network drive. Total I/O operations per second (iops) is punishingly low. If you have a high transaction rate on your database you're going to really hate it. As I mentioned, RDS tries to mitigate this by using multiple EBS drives, but we consider that a band-aid on a pretty fundamental problem with EBS. Instance storage on the other hand is physically local to the VM's host machine, and is therefore much faster.

b) reliability. After 3 years on AWS, our trust in EBS is zero. It fails too often, and its failure pattern is awful: you tend to lose big batches of EBS drives at the same time, and whenever there been a major EBS failure, the API used to launch replacement volumes has failed at the same time, making replacement impossible. Again, we think this is a fundamental problem with the nature of EBS and unlikely to change.

Thanks seldo, interesting. Just been looking at a client's Amazon dashboard where they have a small set-up running, not something I normally deal with but I see their RDS is billing over 2e9 I/Os/month and ends up being a significant part of the non-fixed bit of their bill. I suspect their MySQL queries are doing table scans and building temporary tables for some of the queries; these would both up the I/O count as all RDS storage is EBS, even temporaries?

So if your MySQL storage is ephemeral how do you cope with outage? Replicate it off AWS?

I believe MySQL's working directory is on EBS, so yes, even temporary tables would be on EBS -- don't quote me on that, though.

Re: outages, we use multiple replicated servers in different availability zones -- an outage is usually (though not always!) limited to a single zone. For a region-wide outage, we have emergency backups being sent to a different AWS region (east -> west), and if shit completely hits the fan we have off-AWS backups.

Just out of curiosity, do you have any thoughts on DynamoDB, or have you played with it? Not as a "would you replace what you're doing with DynamoDB" but more a "heres a niche where we think it would work really well"?

I know it's very new so I haven't seen any advice on it, where I don't think I've ever seen a pro-EBS point of view from people with non-trivial experience with it.

Speaking theoretically, Dynamo is some really clever tech built by some very smart people -- it's clear that Amazon are using something very similar internally, so it must work in practice. Beyond that I've no direct experience with it or its performance profile.

If I had a very large, rapidly-growing key-value application and a shortage of experienced ops engineers that made maintaining my own solution impractical (e.g. a cassandra cluster) I would look hard at dynamo.

However, as a matter of principle I am very suspicious of the lock-in that comes with proprietary solutions, no matter how clever. We try not to buy cloud services that only have one vendor.

If you run your own stack, you get lower prices and more flexibility. On the flip side, if you don't know how to administer MySQL, you'll have some learning to do.

Dedicated server pricing is 1/2 or less of what Amazon offers you, and you get better performance to boot. Seems like a no brainer to me (but then again I've been doing "dev ops" stuff since the late 90s and learned many lessons the hard way).

Seldo said they run their own DBs, no that they run their own MySQL instances. Being able to run other engines (e.g. PostgreSQL) is no doubt a benefit of running your own DBs.
Cost might be a reason. Amazon has competition for MySQL hosting, keeping prices relatively low. DynamoDB seems expensive in comparison.