Hacker News new | ask | show | jobs
by vladich 99 days ago
Postgres caches query plans too, the problem is you can only cache what you can share, and if your planner works well, you can share very little, there can be a lot of unique plans even for the same query
1 comments

No it cannot cache query plans between processes (connections) and the only way it can cache in the same process in the same connection is by the client manually preparing it, this was how the big boys did it 30 years ago, not anymore.

Was common guidance back in the day to use stored procedures for all application access code because they where cached in MSSQL (which PG doesn't even do). Then around 2000 it started caching based on statement text and that became much less important.

You would only used prepared statements if doing a bunch of inserts in a loop or something and it has a very small benefit now days only because its not sending the same text over the network over and over and hashing to lookup plan.

I didn't say it can cache between processes. The problem is not caching between processes, it's that caching itself is not very useful, because the planner creates different plans for different input parameters of the same query in the general case. So you can reliably cache plans only for the same sets of parameters. Or you can cache generic plans, which Postgres already does as well (and sharing that cache won't solve much of the problem too).
Other databases cache plans and have for years because it's very useful, many (most?) apps run many of the same statement with differing parameters, its a big win. They do this without the client having to figure out the statement matching logic like your various PG Orms and connection poolers try and do.

They also do things like auto parameterization if the statement doesn't have them and parameter sniffing to make multiple different plans based on different values where it makes sense.

https://learn.microsoft.com/en-us/sql/relational-databases/q...

You can also get this, add HINTs to control this behavior if you don't like it or its causing a problem in production, crazy I know.

https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...

PG is extremely primitive compared to these other systems in this area, and it has to be since it doesn't cache anything unless specifically instructed to for a single connection.

You make some unsubstantiated claims here. I assure you that it isn't as simple as you claim. And what Postgres does here is (mostly) the right thing, you can't do much better. You simply can't decide what plan you need to use based on the query and its parameters alone, unless you already cached that plan for those parameters (and even in that case you need to watch out for possible dramatic changes in statistics). Prepared statements != cached execution plans.
Ah yes so Microsoft and Oracle do these things for no good reason, you are the one making unsubstantiated claims such as "you can't do much better". And "You simply can't decide what plan you need to use based on the query and its parameters alone" which is mostly what those systems do (along with statistics). If you bothered to read what I linked you could see exactly how they are doing it.

I never said it was simple, in fact I said how primitive PG is compared to the "big boys" because they put huge effort into making their systems fast back in the TPS wars of the early 2000's on much slower hardware.

>Prepared statements != cached execution plans

Thats exactly what a prepared statement is:

https://en.wikipedia.org/wiki/Prepared_statement

There are reasons for that, it's useful in a very narrow set of situations. Postgres cached plans exist for the same reason. If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated. From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood. And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway. Go read it somewhere else.