Hacker News new | ask | show | jobs
by williamstein 3411 days ago
I added code to my clients to do "SET enable_nestloop TO off" anytime they connect to the database. This sets a global flag for that session, which disables nestloop query planning. It could indeed impact other unrelated queries, which is deeply disturbing, but I don't know any way to disable nestloop query planning only for a specific query (aside from constantly setting and unsetting that flag?). Incidentally, here is an example of the query that causes all the trouble: "SELECT * FROM file_use WHERE project_id = any(select project_id from projects where users ? '25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb3') ORDER BY last_edited DESC limit 100;" Search for nestloop in https://github.com/sagemathinc/smc/blob/master/src/smc-hub/p... to see my relevant client code.
1 comments

> I don't know any way to disable nestloop query planning only for a specific query

You can do that by using SET LOCAL. Here's what your query would become:

  BEGIN;
  SET LOCAL enable_nestloop = off;
  <QUERY>
  COMMIT;
SET LOCAL applies the setting, but only within the current transaction.

If you post a link to the output from EXPLAIN, I could probably advise you the right way to handle this query.

Thanks!! Also, here's EXPLAIN, examples with timings, and the relevant part of the database schema: https://gist.github.com/williamstein/fb31e07d4057232bd3a3e78... (My email: wstein@sagemath.com)
It's a bit difficult to tell what exactly is going on. For future reference EXPLAIN ANALYZE provides a lot more information into the execution of the query. It tells you how how much time was spent in each part of the plan as well as how many rows were produced by each part.

From what I can tell, this query is getting the 100 last edited files of projects a user is part of. The way it is currently executing is by iterating through the most recently edited files, sees if the user belongs to the project of the file, and repeats until it finds 100 files of projects the user belongs to. Since the query returns no results, I'm you are running the query for a user that is not a part of any project, or of only empty projects. This means the query is looking up the projects of every single file only to find that none of them belong to a project the user was a part. You can check this by running EXPLAIN ANALYZE.

I'm not sure, since you didn't post the EXPLAIN of the query with enable_nestloop = off, but here's what I think is happening. You are getting a merge join between the projects table and the file_use table with the file_use_project_id_idx. If this is correct, this means Postgres first scans through all of the projects and finds the ones the user belongs to. Then it looks up all of the files that are part of one of those projects. Then it sorts those files by the time they were last edited and takes the top 100. I'm not sure if that is what's exactly happening, but I'm sure something similar to it is. You can check how accurate my guess is by running EXPLAIN/EXPLAIN ANALYZE.

The first thing I would try is creating a GIN index on the users field which can be done with the following:

  CREATE INDEX ON projects USING GIN (users jsonb_ops). 
What I would expect to see is a nested loop join between projects and files_used. The query should use the GIN index to find all projects the user belongs to. Then use the file_use_project_id_idx to get the files for each of the projects. Then sort the files by the last time they were edited and take the top 100.