Hacker News new | ask | show | jobs
by barrkel 1021 days ago
If your data is intrinsically shaped like 1:100, what are you supposed to do - issue 10x batches of fetches with 10 elements? Give up on a relational database? Start designing custom sharding schemes to avoid doing the "wrong" thing that might take a few ms longer?

And I'll have you know that we do actually care about the performance of analytic queries. The difference in performance might be minutes vs hours, or hours vs days. Just because it's not as quick as point lookup doesn't mean we are completely time insensitive.

2 comments

If you're issuing a statement that includes an 'IN' statement with thousands of values, where are those values coming from, how are you getting them? In a relational database, a so large list of values generally indicates that they aren't arbitrary but would come from some table (perhaps with some condition), so the selection would be implemented with a table join - if you're fetching a huge list of IDs from the database and then passing them back as part of a query, that doesn't seem right.

In the grandparent comment, "Suppose you are loading thousands, if not millions of objects, and you wish to preload all of their children, your ORM might simply do this by running two SELECTs ..." - well if your ORM is doing something stupid, then you either fix the ORM or work around the ORM to get a reasonable query that will not uselessly send these millions of object IDs back to the database - what a waste of IO.

I was very confused about your comment (because the obvious answer is to use a JOIN), but then I realized you're talking about using a sharded database. To which my response is: don't. Just buy more RAM.

I realize that this doesn't scale indefinitely, but for the 99% of us who just need to manage a few billion rows, it's the right answer.