| I'll add some of mine: Learn your DB server. Check the query plans often. You might get surprised. Tweak and recheck. Usually EXISTS is faster than IN. Beware that NOT EXISTS behaves differently than EXCEPT in regards to NULL values. Instead of joining tables and using distinct or similar to filter rows, consiser using subquery "columns", ie in SELECT list. This can be much faster even if you're pulling 10+ values from the same table, even if your database server supports lateral joins. Just make sure the subqueries return at most one row. Any query that's not a one-off should not perform any table scans. A table scan today can mean an outage tomorrow. Add indexes. Keep in mind GROUP BY clause usually dictates index use. If you need to filter on expressions, say where a substring is equal something, you can add a computed column and index on that. Alternatively some db's support indexing expressions directly. Often using UNION ALL can be much faster than using OR, even for non-trivial queries and/or multiple OR clauses. edit: You can JOIN subqueries. This can be useful to force the filtering order if the DB isn't being clever about the order. |
One interesting thing I found about Postgres that's probably true of others too, often you can manually shard INSERT (SELECT ...) operations to speed them up linearly with the number of CPU cores, even when you have like 10 joins. EXPLAIN first, find the innermost or outermost join, and kick off a separate parallel query operating on each range of rows (id >= start AND id < end). For weird reasons, I relied on this a lot for one job 6 years ago. Postgres has added parallelism in versions 10+, but it's still not this advanced afaik.