Hacker News new | ask | show | jobs
by megalord 885 days ago
> No matter what indexes we threw at the table

I don't have information on how team investigated the whole problem. What table looks like? What indexes exactly? Was there any explain analyzing? 28 millions are not as many as it seems. We have similar sized table but when indexed properly, query is always quick

3 comments

I think that is a good question. It could be the queries were not very amenable to using indexes. In the article they also mentioned the query was taking 'minutes' which seems to indicate it is not an issue due to DB load or because the database is under resourced. I can understand that even if you have everything properly indexed if the DB isn't big enough or you are receiving too many queries then PG can still struggle with queries hitting 28 million rows. We have at least an order of magnitude more rows in one of our biggest tables and are doing paging queries against it and never receive timeouts. We use RDS which I guess is not super great for PG performance and are not even close to using the biggest instance.
If it's hitting an index properly it should perform binary search on it, and just be able to locate the row, correct?

They also say "We had a very large table (28 million) rows that we were getting timeouts querying on anything other than the primary key."

Which implies that indexing did work if that was primary key. How could other indexes fail? They should work with similar performance. Doesn't that imply they must've done something wrong or there was some sort of bug somewhere? Or were the column in some fashion I can't think of?

Also what I think. And it feels like partitioning would imply much more work and potential future implications rather than figuring out what was the issue with some queries or indexes. Postgres should be able to handle billions of rows.

And if the issue was with the queries per second, it would feel to me like it would still be less disruptive to just have some sort of replication going on for reads.

And partitioning really makes sense to me only if you are going to have it on multiple boxes, because otherwise I don't think there really should be many performance gains compared to just indexing?

I'm not seeing if/how they put it on multiple boxes?

A big benefit of partitioning is being able to drop partitions that expire instead of doing deletes. Assuming you’ll ever need to delete of course!
Agreed, it sounds more like they had poorly-written queries and/or grossly incorrect table statistics.