Hacker News new | ask | show | jobs
by monero-xmr 822 days ago
Postgres is simply the best. One thing I would like however is the ability to have control over the query planner for specific tasks. There is a dark art to influencing the query planner, but essentially it is unpredictable, and postgres can get it consistently wrong in certain scenarios. If you could just enable a special query mode that gives you absolute control over the QP for that query, it would solve a major pain point.

I'm not a database developer, and last time I researched this (a few years ago) I found many good reasons for not enabling this from postgres contributors. But it would still be very useful.

6 comments

pg_hint_plan —— Give PostgreSQL ability to manually force some decisions in execution plans. https://github.com/ossc-db/pg_hint_plan
Cool, that looks super useful. Hadn't come across it before. :)

---

Just did a HN submission for it (https://news.ycombinator.com/item?id=39712211) now too.

The problem is not the query planner per se. There is a much more subtle problem and it is related to how you have created the query in the join structure.

For many queries, the order in which you specify the joins doesn't really matter. But there are a number of classes where the join order dramatically affects how fast the query can actually run and nothing the query planner does will change this.

I came across this problem around 30 years ago. By accident, I discovered what the problem cause was - the order of the joins. The original query was built and took 30 - 40 minutes to run. I deleted particular joins to see what intermediate results were. In reestablishing the joins, the query time went to down to a couple of seconds.

I was able to establish that the order of joins in this particular case was generating a Cartesian product of the original base records. By judicious reordering of the joins, this Cartesian product was avoided.

If you are aware of this kind of problem, you can solve it faster than any query planner ever could.

Usually that kind of problem is a result of exceeding {from,join}_collapse_limit, which defaults to 8. If you have more tables than that in a query, Postgres doesn’t exhaustively try all ordering to determine the best, and instead uses its genetic algorithm, which can be worse.

You can raise the limit at the risk of causing query planning times going up exponentially, or refactor your schema, or, as you did, rewrite it to be more restrictive out of the gate. That way, those join paths will be found first and so will be the best found when the planner gives up.

Would love to see a practical example of this.

Another thing to consider is table fragmentation. Fragmentation > bad row count estimation > bad query plan.

We hit some weird query plans. I don't have forensic evidence but here's an example: https://crossref.gitlab.io/engineering/decision-records/dr-0...

Combination of two joins, filtering on all tables, and sorting by the LEFT one. Performance was fine, until we hit the scale when it suddenly became unpredictable.

In hindsight, given the variability of the queries and table structure, I don't think any query planner could have done a good job. The natural answer was to denormalize. But the journey to get there was a little unpredictable.

I’m currently learning the basics of this. Currently struggling with multiple similar scenarios where switching from a left to an inner join, or any equivalent, kills performance. But these are aggregation queries so there are only 5 records returned. I could just filter in my app code no problem. But why the hell does adding “where foo.id is not null” in SQL make it O(N*M)??? CTEs are not helping.
Have you checked with EXPLAIN ANALYZE VERBOSE?
Yup, used a nice EXPLAIN GUI tool as well to try and help.
Every time stuff like this comes up I wonder how much the people having issues would be willing to share - because every time I've fought with the postgres query planner, it eventually turned out what I wanted to do had massively worse performance* because of something I didn't take into account that postgres did. And each time, once I learned what that thing was, I was able to fix it the right way and get the query plan I was expecting, but also with the performance I was expecting.

* Usually I've been able to force query plans by disabling whole operations for a session, such as disallowing "sort" to make it use an index. The real fix in this case, for example, was to use CLUSTER to re-order data on disk, so the correlation statistic was close to 1 and postgres wanted to use the index instead of table scan + sort.

> CLUSTER

And here we see the benefit of clustered indices, á la MySQL. Assuming, of course, your PK is k-sortable.

It may use the same name, but reading https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.h... this doesn't really look like the same thing and wouldn't help here. InnoDB's clustered index (usually on the primary key) is used for fast row lookups by that primary key. It only has an advantage if the primary key is the order you want, but otherwise would have the exact same problem postgres's query planner was protecting me against, but without any way to fix the problem.
I always set this:

SET enable_seqscan = OFF;

If you hit the case I did, this will make your queries perform worse: The reason postgres insisted on table scan + sort was the random access time jumping around the disk and constantly invalidating the cache when it did an index scan. Using CLUSTER on the table made the index order match the table order so the index scan didn't jump around and there was no random access penalty - it instead worked with the disk cache.

If you're on an SSD and not a spinning disk, or have a lot of memory and can expect it to all be cached, there's a separate setting you can change to adjust the random access penalty - see random_page_cost on https://www.postgresql.org/docs/current/runtime-config-query...

It shouldn’t. At worst, if id doesn’t have a usable index, a sequential scan of foo, so O(N).
I haven’t used Postgres but is this the issue you’re talking about? https://github.com/launchbadge/sqlx/pull/1539
No this is a fundamental concept in postgres. If you do EXPLAIN ANALYZE on a query, you get the query plan, which is influenced by the query, indexes, table structure, etc. But the QP may decide to do a silly thing like a sequential scan where a better path exists, and adding an index to avoid the scan would be cost prohibitive. So if you could just override the QP and say "Use this index and do this type of sort and then this type of scan, damn the consequences!" it would give the query writer full control. I don't understand why you can't just force the system to follow a path - you as the developer pay for it regardless.
Nobody does actually. Postgres not having deterministic query plans is a big pain and a good reason not to use it. The same query may use different query plans depending on the estimated number of affected rows, very frustrating.
It's been many years since I've had to use an Oracle db, but they definitely allowed SQL developers to forcibly specify a query-plan - called "query hints", wherein you could specify it to prefer using (or avoiding) certain indexes, certain join-strategies (hash, loop, etc) - this was done via comments immediately before the SQL query - see https://docs.oracle.com/cd/B13789_01/server.101/b10752/hints...

Postgresql doesn't have this, however, I've rarely missed this feature - tables with good indexes, regular db hygiene, etc, almost always perform excellently. The query planner is very, very smart nowadays.

The query hints thing is actually quite useful in an educational context.

There's DB internals/performance course which I'm a TA for and we lean heavily on being able to force Oracle's rule-based or cost-based optimisers --- because then we can get the students to analyse and compare the plans.

I still find myself comparing it to oracle on these points.

As a user, postgres is a far more ergonomic database, but things like this show good old insane oracle is still strongly ahead on some of these points.

In practice, postgres is not as bad as you'd expect in the large, but it can still be anoying.

Oh that’s unfortunate, thanks for explaining it. Postgres has been in my list to check out but haven’t done any personal projects that I’d need it for… yet
Just make sure you test for your scale and with representative data and queries. There will be various tipping points with any technology. But you can find them with experimentation.
I wonder if for non-trivial use cases we should just go back to imperative programming.
The way I see it traditional databases are frameworks, and we need to switch to something more like libraries - use the high-level interface when we need it, but be able to dig underneath. Postgres has taken some small steps in this direction with e.g. making the parser available as a separate library; some newer systems (e.g. distributed-first datastores that combine LevelDB with some higher-level layer) go further.
IME, many query flips occur due to inadequate analysis, inadequate vacuuming, or table / index bloat.

Examining statistics for your tables / indices can be quite helpful in determining the issue.

MySQL offers this via use/force index (…). Similar problem where the QP will inexplicably decide one day to make a query really slow and you gotta override it.
We have a couple queries where using the correct index they'll take milliseconds, using the wrong index it'll take minutes - and mysql occasionally enough to be a noticeable load on the database chooses the wrong one and we've specified "USE INDEX" even though I really hate having to do so.
I'm curious about the query and schema that you're using