| HN Mirror

This is a thing that people say, but this is incorrect.

Classic example: Multitenant DB. I have a user table. I have a document table. I have a client table. documents and clients are key'd to user. In particular, there's a property of the system that document.client.user == document.user.

I do a "naive" query to count the number of documents per client, like:

- select count(*), client_id from document where document.text ilike %searchstring% and client_id in (select id from client where user_id = 1) group by client_id;

Now, it turns out that since I have this multitenant data, I have indexes alongside things like (user_id, client_id) on document. I also want to support text search so I have an full-text-search index on (user_id, document.text).

Without properly "guiding" the filtering (usually re-repeating the "where user_id = 1" fragment on every joining table) I will not be able to take advantage of my FTS index, because Postgres decides that's not slow (and _doesn't know about the relation between document.client.user_id and client.use_id). It instead tries hard to implement this either by pulling all of the client documents into memory and then doing an ad-hoc search there, or doing a FTS across all tenants (because I have an index on document.text for cross-tenant searching in an admin), and then scanning through that and removing by client_id.

This is a query plan I would not write if I wrote it myself, because it would be "obviously wrong". It would be a lot of work and clearly incorrect for what I want.

Meanwhile people writing features like this add indexes specifically to support certain workflows, and are often blissfully unaware that those indexes are not being used when it makes the most sense. Of course you gotta check experimental data and measure etc etc. But I like the idea of doing more data design before you start having performance issues, and I think that explicit index usage in particular would be a huge benefit on that front.

(there is a "fix" for the original query, involving repeating the user_id filter at other levels)