Hacker News new | ask | show | jobs
by acidity 3411 days ago
>>> Weird. OK, I tried it with some other parameters, and it suddenly took 15 seconds at 100% CPU, with PostgreSQL doing some linear scan through data. Using EXPLAIN I found that with full production data the query planner was doing something idiotic in some cases. I learned how to impact the query planner, and then this query went back to taking only a few milliseconds for any input. With this one change to influence the query planner (to actually always use an index I had properly made), things became dramatically faster. Basically the load on the database server went from 100% to well under >>> 5%.

I am actually interested in this part. Figuring out issues with EXPLAIN is one of my favorite things.

2 comments

I'm replying due to this same part/paragraph. I've been dealing with some manifestation of EXPLAIN since Oracle 6. In the last few years I've become handy with PostgreSQL's EXPLAIN as well, and this reminds me of my biggest hang-up regarding PostgreSQL; their hostility toward optimizer hints.

Like Mr. Stein I too have found myself in bad places with PostgreSQL's optimizer. This is commonplace with relational systems; every such system I've ever dealt with, including all versions of Oracle since the mid 90's, Informix, MS-SQL, DB/2 (on AS/400, Windows and Linux,) and PostgreSQL eventually get handed a query and a schema that produces the wrong plan and has intolerably bad performance. No exception. None of these attempts to create flawless optimizers that anticipate every use case has ever succeeded, PostgreSQL included.

With other systems there are hints that, as a last resort, you can apply to get efficient results. Not so much with PostgreSQL. Not implementing the sort of hints that solve these problems (as opposed to the often ineffectual enable_* planner configuration, unacceptable global configuration and other workarounds needed with PostgreSQL) is policy:

"We are not interested in implementing hints in the exact ways they are commonly implemented on other databases. Proposals based on 'because they've got them' will not be welcomed."

How about proposals based on "because your hint-free optimizer gets it wrong and I require a working solution without too many backflips and somersaults or database design lectures." No? Then sorry; I can't risk getting painted into a corner by your narrow minded and naive policy. PostgreSQL goes no further than non-critical, ancillary systems when I have say in it. And I do.

You're absolutely correct. Optimizer hints are a source of contention in Postgres. I'm sure you're already aware, but just for completeness, here's some more information on optimizer hints and Postgres from Simon Riggs back in 2011. I don't expect this to sway opinion one way or the other. It's useful to read more on Postgres' policy.

http://blog.2ndquadrant.com/hinting_at_postgresql/

From the introduction:

- Introducing hints is a common source of later problems, because fixing a query place once in a special case isn’t a very robust approach. As your data set grows, and possibly changes distribution as well, the idea you hinted toward when it was small can become an increasingly bad idea.

- Adding a useful hint interface would complicate the optimizer code, which is difficult enough to maintain as it is. Part of the reason PostgreSQL works as well as it does running queries is because feel-good code (“we can check off hinting on our vendor comparison feature list!”) that doesn’t actually pay for itself, in terms of making the database better enough to justify its continued maintenance, is rejected by policy. If it doesn’t work, it won’t get added. And when evaluated objectively, hints are on average a problem rather than a solution.

- The sort of problems that hints work can be optimizer bugs. The PostgreSQL community responds to true bugs in the optimizer faster than anyone else in the industry. Ask around and you don’t have to meet many PostgreSQL users before finding one who has reported a bug and watched it get fixed by the next day.

Now, the main completely valid response to finding out hints are missing, normally from DBAs who are used to them, is “well how do I handle an optimizer bug when I do run into it?” Like all tech work nowadays, there’s usually huge pressure to get the quickest possible fix when a bad query problem pops up....

Having never used postgres, this seems extremely scary. I've only ever needed index hints a couple times in other databases, but when you need them there isn't usually an alternative. Messing with global knobs is a good way to cause more problems than you are solving.

I've even entertained the idea that every query should be hinted. For OLTP workloads, you practically always know exactly how you want the DB to execute your query anyways. And often times you find out very late that the query planner made the wrong choice and now your query is taking orders of magnitude longer than it should (worse, sometimes this changes at runtime). I've never actually gone through with this religiously though...

>> Having never used postgres, this seems extremely scary. I've only ever needed index hints a couple times in other databases, but when you need them there isn't usually an alternative. Messing with global knobs is a good way to cause more problems than you are solving.

You've got the plot exactly. The last such battle I was involved with ended in creating a materialized view to substitute for several tables in a larger join; without the view there was no way[1] to get an acceptable plan. Creating this view was effectively just a form of programming our own planner. And yes, the need to update the view to get the desired result is an ongoing problem; one that's scheduled to get solved with a migration to another DB.

Like you I've never been all that quick to employ hints. I tend to use them while experimenting during development or troubleshooting and avoid them in production code. But there have been production uses, and you know what? The world did not end. No one laughed at or fired me. No regulatory agency fined me. It did not get posted on Daily WTF. No subsequent maintenance programmer has ever shown up at my home in the dead of night. It just solved the problem, quickly and effectively.

Sure would be nice if people purporting to offer a fit-for-purpose relational systems understood the value of a little pragmatism.

[1] given the finite amount of time we could sacrifice to deal with it

Just to be clear, those settings are global per query, not global for the entire server. Still that makes them almost useless for large queries, but I would not exactly call them scary.
I added code to my clients to do "SET enable_nestloop TO off" anytime they connect to the database. This sets a global flag for that session, which disables nestloop query planning. It could indeed impact other unrelated queries, which is deeply disturbing, but I don't know any way to disable nestloop query planning only for a specific query (aside from constantly setting and unsetting that flag?). Incidentally, here is an example of the query that causes all the trouble: "SELECT * FROM file_use WHERE project_id = any(select project_id from projects where users ? '25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb3') ORDER BY last_edited DESC limit 100;" Search for nestloop in https://github.com/sagemathinc/smc/blob/master/src/smc-hub/p... to see my relevant client code.
> I don't know any way to disable nestloop query planning only for a specific query

You can do that by using SET LOCAL. Here's what your query would become:

  BEGIN;
  SET LOCAL enable_nestloop = off;
  <QUERY>
  COMMIT;
SET LOCAL applies the setting, but only within the current transaction.

If you post a link to the output from EXPLAIN, I could probably advise you the right way to handle this query.

Thanks!! Also, here's EXPLAIN, examples with timings, and the relevant part of the database schema: https://gist.github.com/williamstein/fb31e07d4057232bd3a3e78... (My email: wstein@sagemath.com)
It's a bit difficult to tell what exactly is going on. For future reference EXPLAIN ANALYZE provides a lot more information into the execution of the query. It tells you how how much time was spent in each part of the plan as well as how many rows were produced by each part.

From what I can tell, this query is getting the 100 last edited files of projects a user is part of. The way it is currently executing is by iterating through the most recently edited files, sees if the user belongs to the project of the file, and repeats until it finds 100 files of projects the user belongs to. Since the query returns no results, I'm you are running the query for a user that is not a part of any project, or of only empty projects. This means the query is looking up the projects of every single file only to find that none of them belong to a project the user was a part. You can check this by running EXPLAIN ANALYZE.

I'm not sure, since you didn't post the EXPLAIN of the query with enable_nestloop = off, but here's what I think is happening. You are getting a merge join between the projects table and the file_use table with the file_use_project_id_idx. If this is correct, this means Postgres first scans through all of the projects and finds the ones the user belongs to. Then it looks up all of the files that are part of one of those projects. Then it sorts those files by the time they were last edited and takes the top 100. I'm not sure if that is what's exactly happening, but I'm sure something similar to it is. You can check how accurate my guess is by running EXPLAIN/EXPLAIN ANALYZE.

The first thing I would try is creating a GIN index on the users field which can be done with the following:

  CREATE INDEX ON projects USING GIN (users jsonb_ops). 
What I would expect to see is a nested loop join between projects and files_used. The query should use the GIN index to find all projects the user belongs to. Then use the file_use_project_id_idx to get the files for each of the projects. Then sort the files by the last time they were edited and take the top 100.