| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hennk 5168 days ago
	Just out of interest, but how large was the "moderately sized" IN() clause? I am asking, as we are looking at postgresql as an alternative to MySQL, and we have some queries currently with up to 5000 values inside IN()

2 comments

rosser 5168 days ago

I don't remember specifically, but I believe it was in the thousands to low tens of thousands of rows. It's also not consistent. I've seen larger IN() clauses that never have a problem, and smaller ones that consistently do. It's been on my very low priority to-do list to put together some demo cases for the mailing lists, because overnight to < 4s just from that little refactor isn't the greatest...

link

tmoertel 5168 days ago

Are we talking about IN clauses that contain a correlated subquery or something the optimizer would have a hard time determining was independent of outer context?

link

jeltz 5168 days ago

PostgreSQL is able to convert correlated subqueries with IN () clauses into joins in most cases. My guess is that it could have been two queries refactored into one.

link

spudlyo 5168 days ago

This is not super fast in MySQL either, compared to the alternative of creating a temporary table with your 5000 values in it and doing a join on that.

link

sjwright 5167 days ago

That's a problem I routinely come across, and it's frustrating, because there's no reason a self-contained IN(SELECT) should ever be slower than two-stepping it with a temporary table, or worse, two queries on the client side. But it often is.

link