Just out of interest, but how large was the "moderately sized" IN() clause? I am asking, as we are looking at postgresql as an alternative to MySQL, and we have some queries currently with up to 5000 values inside IN()
I don't remember specifically, but I believe it was in the thousands to low tens of thousands of rows. It's also not consistent. I've seen larger IN() clauses that never have a problem, and smaller ones that consistently do. It's been on my very low priority to-do list to put together some demo cases for the mailing lists, because overnight to < 4s just from that little refactor isn't the greatest...
Are we talking about IN clauses that contain a correlated subquery or something the optimizer would have a hard time determining was independent of outer context?
PostgreSQL is able to convert correlated subqueries with IN () clauses into joins in most cases. My guess is that it could have been two queries refactored into one.
This is not super fast in MySQL either, compared to the alternative of creating a temporary table with your 5000 values in it and doing a join on that.
That's a problem I routinely come across, and it's frustrating, because there's no reason a self-contained IN(SELECT) should ever be slower than two-stepping it with a temporary table, or worse, two queries on the client side. But it often is.