Hacker News new | ask | show | jobs
by SigmundA 99 days ago
Other databases cache plans and have for years because it's very useful, many (most?) apps run many of the same statement with differing parameters, its a big win. They do this without the client having to figure out the statement matching logic like your various PG Orms and connection poolers try and do.

They also do things like auto parameterization if the statement doesn't have them and parameter sniffing to make multiple different plans based on different values where it makes sense.

https://learn.microsoft.com/en-us/sql/relational-databases/q...

You can also get this, add HINTs to control this behavior if you don't like it or its causing a problem in production, crazy I know.

https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

PG is extremely primitive compared to these other systems in this area, and it has to be since it doesn't cache anything unless specifically instructed to for a single connection.

1 comments

You make some unsubstantiated claims here. I assure you that it isn't as simple as you claim. And what Postgres does here is (mostly) the right thing, you can't do much better. You simply can't decide what plan you need to use based on the query and its parameters alone, unless you already cached that plan for those parameters (and even in that case you need to watch out for possible dramatic changes in statistics). Prepared statements != cached execution plans.
Ah yes so Microsoft and Oracle do these things for no good reason, you are the one making unsubstantiated claims such as "you can't do much better". And "You simply can't decide what plan you need to use based on the query and its parameters alone" which is mostly what those systems do (along with statistics). If you bothered to read what I linked you could see exactly how they are doing it.

I never said it was simple, in fact I said how primitive PG is compared to the "big boys" because they put huge effort into making their systems fast back in the TPS wars of the early 2000's on much slower hardware.

>Prepared statements != cached execution plans

Thats exactly what a prepared statement is:

https://en.wikipedia.org/wiki/Prepared_statement

There are reasons for that, it's useful in a very narrow set of situations. Postgres cached plans exist for the same reason. If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated. From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood. And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway. Go read it somewhere else.
>There are reasons for that, it's useful in a very narrow set of situations.

So narrow its enabled by default for all statements from the "big boy" commercial RDBMS's...

https://www.ibm.com/docs/en/i/7.4.0?topic=overview-plan-cach...

https://docs.oracle.com/en/database/oracle/oracle-database/1...

https://learn.microsoft.com/en-us/sql/relational-databases/p...

https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c...

>Postgres cached plans exist for the same reason.

Postgresql doesn't cache plans unless the client explicitly sends commands to do so. Applications cannot take advantage of this unless they keep connections open and reuse them in a pool and they must mange this themselves. The plan has to be planned for every separate connection/process rather than a single cached planed increasing server memory costs which are plan cache size X number of connections.

It has no "reason" to cache plans the client must do this using its "reasons".

>If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated.

You are making all sorts of claims without nary a link to back it up. Are you suggestion PG does better than MSSQL, Oracle and DB2 in planning while be constrained to replan on every single statement? The PG planner is specifically kept simple so that it is fast at its job, not thorough or it would adversely effect execution time more than it already does, this is well documented and always a concern when new features are proposed for it.

>From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood.

Sticks and stones, is that all you have how about something substantial.

> And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway.

Ok Mr. Unsubstantiated are we talking about PG or not? What does one use prepared statements for in PG hmmm, you know the thing you call the PG plan cache? How about something besides your claim that prepared statements are not in fact plan caches? Are you talking about completely different DB systems? How about you substantiate that?

https://www.postgresql.org/docs/current/runtime-config-query...

and then

https://www.postgresql.org/docs/current/sql-prepare.html

Read carefully about "plan_cache_mode" and how it works (and its default settings). Sorry, that's my last message in this thread, and I'm still here just for educational purposes, because what you're talking about is in fact a common misconception. If you read it carefully, you'll see that generic plans do not require any "explicit commands", Postgres executes a query 5 times in custom mode, then tries a generic one, if it worked (not much worse than an average of 5 custom plans), the plan is cached. You can turn it off though. And I'd recommend to turn it off for most cases, because it's a pretty bad heuristics. Nevertheless, for some (pretty narrow set of) cases it's useful.

So, Mr Big Boy, now we can get to what a prepared statement in Postgres is. Prepared statements are cached in a session, but if that statement was cached in custom mode, it won't contain a plan. When Postgres receives a prepared statement in custom mode, it will just skip parsing, that's it. The query will still be planned, because custom plans rely on input parameters. If we run it in generic mode, then the plan is cached.

I think you should read carefully, this only applies to prepared statements within the same session, which is exactly what I have been saying. There is no global cache, and if you reset the session it's gone.

This controls whether prepared statements even use a cached plan at all. Other database can do this with hints and they can skip parsing by using stored procedures which are basically globally named prepared statements that the client can call without preparing a temporary one or they can do prepared but again this is typically a waste of time because parsing enough to match existing plans is fast (soft vs hard parse in Oracle speak). They have many more options with more powerful caching abilities that all clients can share across sessions.

The only time PG "automatically" caches the plan is when it implicitly prepares the plan within a PL/pgsql statement like doing a insert loop inside a function, its still is only for the current session. This is just part of the planning process in other databases that cache everything all the time globally.

You don't seem to understand that most other commercial "big-boy" RDBMS cache plans across sessions and that nothing has to be done for them to reuse between completely different connections with differing parameters and can still have specialized versions based on these parameters values vs a single generic plan.

At least now you admit prepared statements are in-fact a plan cache, contradicting your other statements, and seem to make a gotcha out of an option an option to disable that cache.

You can see various discussions on pg-hackers, here is one where the submitter confirms everything I have said and attempted to add the auto part but not tackle the much harder sharing between sessions part and was shot down, I don't believe much has change in PG around plan caching since this post and even has a guy that worked on DB2 talking about how they did it: https://www.postgresql.org/message-id/flat/8e76d8fc-8b8c-14b...

> then tries a generic one, if it worked (not much worse than an average of 5 custom plans), the plan is cached

Seems like it's not great at detecting this in all cases[1]. That said, I do note that was reproduced on PG16, perhaps they've made improvements since, given the documentation explicitly mentions what you said.

[1]: https://www.michal-drozd.com/en/blog/postgresql-prepared-sta...