Hacker News new | ask | show | jobs
by ashray 4985 days ago
> The key issue is that this widget performs a query on the wp_comments table which becomes slower and slower as the number of comments rises..

I dont' see why a selective read based on primary key indexes is taking so long. How many rows are we talking about here ? I'm sure it's under a billion rows - right ? Partitioning it may help but I think there's something significantly wrong with the way the lookup is going through.

The recent comments widget is probably doing things the wrong way around. Is it caching the returned values ? Does it matter if the recent comments widget is based on data that is.. say 2 minutes old ?

I really don't agree with your performance bottlenecks. I run several sites much larger (at-least 20x traffic per day) and granted that they do not all run Wordpress, I know from experience that your system should not be grinding to a halt with just 50k daily uniques.

Even imagining that the comments lookup is like "get all comments for post ID, 2966 --" - that can't possibly be slow enough to destroy your entire server setup. Maybe you just need more RAM ?

Get a dedicated box and get someone to optimize it. If your entire database can be in RAM (Get a box with 16GB of RAM ?), that should work well. That won't cost you $250/month even.

EDIT: I see from another post: As Sean, my head SysAdmin wrote on a ticket to you, some of your pages were performing a sort on 188590 rows in memory each time.

Was that the recent comments widget doing that ? Well, that can be fixed with some clever caching or even a clever fetch technique!

Is that necessary for every load ? Do you have a particular blog post with that many comments ? The right way to go about that would be to pick out comments for a particular post and then sort them - query-wise. Again, your WP install probably needs some (much needed) tuning.

I understand your desire to have someone else worry about your servers, but someone also (apparently) has to worry about your code, and as I understand right now - that person is you. Well, you could definitely make some code level improvements and save on the 'server management' department.

2 comments

I think it may be this "widget" (I don't know the WP ecosystem, but this seems reasonable): http://wordpress.org/extend/plugins/recent-comments-widget-w...

I only see one query in there: $comments = $wpdb->get_results("SELECT $wpdb->comments.* FROM $wpdb->comments JOIN $wpdb->posts ON $wpdb->posts.ID = $wpdb->comments.comment_post_ID WHERE comment_approved = '1' AND post_status = 'publish' ORDER BY comment_date_gmt DESC LIMIT 150");

I'm not sure why they are fetching 150 (why would you use that many?) but this query could be slow if you have many posts and many comments. Also, the goofy thing is that they are wiping out the cache every time and regenerating it instead of (say) incrementally adding to it.

I'd re-write it with an exists, which may help performance instead of a join (I've seen MySQL be pretty dumb with joins when you basically want an exists):

SELECT whatever FROM comments WHERE EXISTS( SELECT * FROM posts WHERE ... ) AND comment_approved = 1 ORDER BY LIMIT xx

In this case, MySQL may be having a problem where it can't read a multi-column index on post_id and comment_date_gmt DESC (I'm not sure if you can specify ordering in MySQL indexes, been a while), thus it's basically doing an in-memory sort of every approved comment (on a published post, which is likely all of them) in the system, which in his case was about ~200K. This could definitely take several seconds if there isn't enough memory to do it all at once (lots of swapping).

Personally, I'd try and modify the logic to be more conducive to indexing - do you really care if a comment was on a post that you made 10 years ago? MySQL also (IIRC) doesn't have bitmap indexes, so that "comment_approved" thing may or may not invalidate an index usage unless you add it to the index.

I'd probably do something like: the "n" (where n is < 150) most recent comments to a post made within the last n number of days, so you can really cut down on the number of rows you need to sort.

I don't think that it's unreasonable to expect that a mainline piece of code (the Recent Comments widget) should work well on a service dedicated to running the mainline code.

But as I said elsewhere, the performance thing is just the straw that broke the camel's back.