Hacker News new | ask | show | jobs
by SigmundA 107 days ago
No it cannot cache query plans between processes (connections) and the only way it can cache in the same process in the same connection is by the client manually preparing it, this was how the big boys did it 30 years ago, not anymore.

Was common guidance back in the day to use stored procedures for all application access code because they where cached in MSSQL (which PG doesn't even do). Then around 2000 it started caching based on statement text and that became much less important.

You would only used prepared statements if doing a bunch of inserts in a loop or something and it has a very small benefit now days only because its not sending the same text over the network over and over and hashing to lookup plan.

1 comments

I didn't say it can cache between processes. The problem is not caching between processes, it's that caching itself is not very useful, because the planner creates different plans for different input parameters of the same query in the general case. So you can reliably cache plans only for the same sets of parameters. Or you can cache generic plans, which Postgres already does as well (and sharing that cache won't solve much of the problem too).
Other databases cache plans and have for years because it's very useful, many (most?) apps run many of the same statement with differing parameters, its a big win. They do this without the client having to figure out the statement matching logic like your various PG Orms and connection poolers try and do.

They also do things like auto parameterization if the statement doesn't have them and parameter sniffing to make multiple different plans based on different values where it makes sense.

https://learn.microsoft.com/en-us/sql/relational-databases/q...

You can also get this, add HINTs to control this behavior if you don't like it or its causing a problem in production, crazy I know.

https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

PG is extremely primitive compared to these other systems in this area, and it has to be since it doesn't cache anything unless specifically instructed to for a single connection.

You make some unsubstantiated claims here. I assure you that it isn't as simple as you claim. And what Postgres does here is (mostly) the right thing, you can't do much better. You simply can't decide what plan you need to use based on the query and its parameters alone, unless you already cached that plan for those parameters (and even in that case you need to watch out for possible dramatic changes in statistics). Prepared statements != cached execution plans.
Ah yes so Microsoft and Oracle do these things for no good reason, you are the one making unsubstantiated claims such as "you can't do much better". And "You simply can't decide what plan you need to use based on the query and its parameters alone" which is mostly what those systems do (along with statistics). If you bothered to read what I linked you could see exactly how they are doing it.

I never said it was simple, in fact I said how primitive PG is compared to the "big boys" because they put huge effort into making their systems fast back in the TPS wars of the early 2000's on much slower hardware.

>Prepared statements != cached execution plans

Thats exactly what a prepared statement is:

https://en.wikipedia.org/wiki/Prepared_statement

There are reasons for that, it's useful in a very narrow set of situations. Postgres cached plans exist for the same reason. If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated. From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood. And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway. Go read it somewhere else.
>There are reasons for that, it's useful in a very narrow set of situations.

So narrow its enabled by default for all statements from the "big boy" commercial RDBMS's...

https://www.ibm.com/docs/en/i/7.4.0?topic=overview-plan-cach...

https://docs.oracle.com/en/database/oracle/oracle-database/1...

https://learn.microsoft.com/en-us/sql/relational-databases/p...

https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c...

>Postgres cached plans exist for the same reason.

Postgresql doesn't cache plans unless the client explicitly sends commands to do so. Applications cannot take advantage of this unless they keep connections open and reuse them in a pool and they must mange this themselves. The plan has to be planned for every separate connection/process rather than a single cached planed increasing server memory costs which are plan cache size X number of connections.

It has no "reason" to cache plans the client must do this using its "reasons".

>If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated.

You are making all sorts of claims without nary a link to back it up. Are you suggestion PG does better than MSSQL, Oracle and DB2 in planning while be constrained to replan on every single statement? The PG planner is specifically kept simple so that it is fast at its job, not thorough or it would adversely effect execution time more than it already does, this is well documented and always a concern when new features are proposed for it.

>From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood.

Sticks and stones, is that all you have how about something substantial.

> And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway.

Ok Mr. Unsubstantiated are we talking about PG or not? What does one use prepared statements for in PG hmmm, you know the thing you call the PG plan cache? How about something besides your claim that prepared statements are not in fact plan caches? Are you talking about completely different DB systems? How about you substantiate that?