Hacker News new | ask | show | jobs
by tharkun__ 1871 days ago
Disclaimer: It's been a while, but my very first job out of uni was dealing with large queries like that for reporting and import/export out of our OLTP database directly.

15 tables would've been not very common for us, but something between 5 and 10 was normal. Many of those tables would've had millions upon millions of rows (while some were simple 'key tables' with only hundreds or a few thousand records)

The one thing that I learned really fast is that yes, "prefiltering/subgrouping" as the OP calls it, is very important. If you cut down on the number of rows that your query "starts out with" was very very important, as it cuts down on the amount of data that needs to be dealt with in the rest of the query. This was for an old Sybase ASE based system. IIRC, ASE would only be able to automatically optimize this across 4 clauses (might misremember the number and I left before they upgraded to the new newer version with a better optimizer), so ordering of your join clauses was important. If the filtering that cut down on the amount of data needed from other tables came first, your query would run way faster, than if the filters were way down with the rest of the joins.

Just think about it, if you start out with getting 6 million patient records and then start collecting 6 million records from the next table for a join and so on and so forth, that's way more data that needs to be read and churned through than if you can 'start on the other end' so to speak, whittle it down to say 50000 records that now need to be looked up in the patient table.