Hacker News new | ask | show | jobs
by mithaler 5566 days ago
We were bitten by EBS' slowness at my company recently, when moving an existing project to AWS. You effectively can't get decent performance off of a single EBS volume with PostgreSQL; you need to set up 10 or so of them and make a software RAID to remove the bottleneck. It's a fairly large time commitment to build and maintain, but it's pretty fast and reliable once it's up and running (cases like the recent downtime notwithstanding).

Can anyone tell me if MySQL fares any better than Postgres on a single EBS volume? I wouldn't assume it does but I shouldn't be making assumptions.

3 comments

MySQL does not fare any better on a single EBS volume. The issues with EBS are systematic. Similarly you have to raid several volumes together to see decent performance, and this is the recommended AWS solution.
Did you use Raid10? I would love to see a post on using postgresql with ec2/ebs -- how to setup raid, etc.
Orion Henry at Heroku wrote about this and described different software RAID configurations and the performance characteristics of each a while back:

http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...

Yes, but as a lowly developer, I have no idea how to set read-ahead buffers or change io schedulers.

Plus, that's a year old, would love to see some updated advice. You'd think Amazon would write more guides like this.

Well, that's really just "-setra" and other file system mounting options, and mdadm (Linux software RAID) configuration options. Yes, there's a little bit of a learning curve and pain to get things set up, but it's not completely out of reach.

Despite being relatively old, I think the advice and approach still holds. Clearly, EBS hasn't improved since then and the need to do this kind of striping over EBS volumes hasn't been obviated yet.

I found a benchmark from 2008 that details the problems with RAID10 and sourced it in a comment above [1]. These are just raw disk transfer numbers, though. I can only imagine how they would change as CPU usage/postgres load climbs. IIRC disk IO is network traffic and network traffic is CPU dependent, so as load increases, IO will suffer greatly.

[1] http://news.ycombinator.com/item?id=2341425

Build-out Script for Postgres/PostGIS with RAID 10 on Amazon EBS volumes: http://sproke.blogspot.com/2010/12/build-out-script-for-post...
I second that.
Did you do any performance tweaking to PostgreSQL with respect to EBS? You have an insanely deep write buffer and quite good random read performance with EBS, which is nothing like the disks people normally deploy PostgreSQL to.
I tuned the hell out of our big postgresql instance a year ago, but I'll be damned if I can remember the rational for every change. I have a list of all the changes from default, but I've long since forgotten/lost the reason for making them.

That being said, we get more bang for our buck by spreading our data across many small databases that don't need much tuning beyond upping the memory defaults. The EC2 cloud isn't great for the uber-server, but it's halfway decent for many small servers.