Hacker News new | ask | show | jobs
by phoboslab 1021 days ago
I'd argue that if you have an IN() with more than a dozen values, you're doing it wrong. Especially so if that's something that your app does often and it needs to be fast (as opposed to one-of analytic queries where you don't care about performance).
1 comments

If your data is intrinsically shaped like 1:100, what are you supposed to do - issue 10x batches of fetches with 10 elements? Give up on a relational database? Start designing custom sharding schemes to avoid doing the "wrong" thing that might take a few ms longer?

And I'll have you know that we do actually care about the performance of analytic queries. The difference in performance might be minutes vs hours, or hours vs days. Just because it's not as quick as point lookup doesn't mean we are completely time insensitive.

If you're issuing a statement that includes an 'IN' statement with thousands of values, where are those values coming from, how are you getting them? In a relational database, a so large list of values generally indicates that they aren't arbitrary but would come from some table (perhaps with some condition), so the selection would be implemented with a table join - if you're fetching a huge list of IDs from the database and then passing them back as part of a query, that doesn't seem right.

In the grandparent comment, "Suppose you are loading thousands, if not millions of objects, and you wish to preload all of their children, your ORM might simply do this by running two SELECTs ..." - well if your ORM is doing something stupid, then you either fix the ORM or work around the ORM to get a reasonable query that will not uselessly send these millions of object IDs back to the database - what a waste of IO.

I was very confused about your comment (because the obvious answer is to use a JOIN), but then I realized you're talking about using a sharded database. To which my response is: don't. Just buy more RAM.

I realize that this doesn't scale indefinitely, but for the 99% of us who just need to manage a few billion rows, it's the right answer.