| The server hosting one of our live apps is under high loads and gets into a state where it stops responding to http requests several times a day. These outages last typically anywhere from a few minutes all the way up to half an hour or so. The major culprit here is MySQL load and we've been working on optimizing this (those interested can refer to my previous thread on this: https://news.ycombinator.com/item?id=6348903). For now we've found that restarting the httpd and mysqld services brings things back to normal almost immediately. While we continue to work on a - more elegant - solution, we're thinking of writing a bash/shell script that runs on hourly cron, checks load average (via uptime or /proc/loadavg) and if found higher than a threshold, restarts the services. Can anyone think of any downside to this (used at least as a temporary measure)? |
Also, consider creating a buffer in the application layer that buffers inserts/updates and executes them once as a single transaction, if they don't need to be executed immediately. It puts less stress on the database. Of course, this would require a lot of rewriting in your app, so not sure if you want to go through this route.
Indices are another area. I'm sure plenty of people have told you to optimize your indices, but also consider REMOVING unnecessary indices. Do you have an index on a text column, or multiple varchar columns? Those can be killer after awhile because inserts will slow down. Consider changing indices on varchar columns to indices on an int column by hashing those strings.
A quick suggestion: Install NewRelic (it's free for a certain period), and check out the database transactions that are taking up the most CPU load. Sometimes there's that 1 query you overlooked that is table scanning and could be the main culprit.
Also, are you using Rails by any chance? If so, there are other areas I can suggest.
And please post your server specs. Maybe your VPS just does suck (no offense), and the easiest route is just to upgrade your server.