| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SigmundA 147 days ago

Ah yes so Microsoft and Oracle do these things for no good reason, you are the one making unsubstantiated claims such as "you can't do much better". And "You simply can't decide what plan you need to use based on the query and its parameters alone" which is mostly what those systems do (along with statistics). If you bothered to read what I linked you could see exactly how they are doing it.

I never said it was simple, in fact I said how primitive PG is compared to the "big boys" because they put huge effort into making their systems fast back in the TPS wars of the early 2000's on much slower hardware.

>Prepared statements != cached execution plans

Thats exactly what a prepared statement is:

https://en.wikipedia.org/wiki/Prepared_statement

1 comments

vladich 147 days ago

There are reasons for that, it's useful in a very narrow set of situations. Postgres cached plans exist for the same reason. If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated. From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood. And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway. Go read it somewhere else.

link

SigmundA 147 days ago

>There are reasons for that, it's useful in a very narrow set of situations.

So narrow its enabled by default for all statements from the "big boy" commercial RDBMS's...

https://www.ibm.com/docs/en/i/7.4.0?topic=overview-plan-cach...

https://docs.oracle.com/en/database/oracle/oracle-database/1...

https://learn.microsoft.com/en-us/sql/relational-databases/p...

https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c...

>Postgres cached plans exist for the same reason.

Postgresql doesn't cache plans unless the client explicitly sends commands to do so. Applications cannot take advantage of this unless they keep connections open and reuse them in a pool and they must mange this themselves. The plan has to be planned for every separate connection/process rather than a single cached planed increasing server memory costs which are plan cache size X number of connections.

It has no "reason" to cache plans the client must do this using its "reasons".

>If you're claiming Oracle and MSSQL do _much_ better in this area - that's what I call unsubstantiated.

You are making all sorts of claims without nary a link to back it up. Are you suggestion PG does better than MSSQL, Oracle and DB2 in planning while be constrained to replan on every single statement? The PG planner is specifically kept simple so that it is fast at its job, not thorough or it would adversely effect execution time more than it already does, this is well documented and always a concern when new features are proposed for it.

>From what you write further it's pretty clear you don't have a lot of understanding what happens under the hood.

Sticks and stones, is that all you have how about something substantial.

> And no, prepared statements are not what you read in Wikipedia. Not in all databases anyway.

Ok Mr. Unsubstantiated are we talking about PG or not? What does one use prepared statements for in PG hmmm, you know the thing you call the PG plan cache? How about something besides your claim that prepared statements are not in fact plan caches? Are you talking about completely different DB systems? How about you substantiate that?

link

vladich 146 days ago

https://www.postgresql.org/docs/current/runtime-config-query...

and then

https://www.postgresql.org/docs/current/sql-prepare.html

Read carefully about "plan_cache_mode" and how it works (and its default settings). Sorry, that's my last message in this thread, and I'm still here just for educational purposes, because what you're talking about is in fact a common misconception. If you read it carefully, you'll see that generic plans do not require any "explicit commands", Postgres executes a query 5 times in custom mode, then tries a generic one, if it worked (not much worse than an average of 5 custom plans), the plan is cached. You can turn it off though. And I'd recommend to turn it off for most cases, because it's a pretty bad heuristics. Nevertheless, for some (pretty narrow set of) cases it's useful.

So, Mr Big Boy, now we can get to what a prepared statement in Postgres is. Prepared statements are cached in a session, but if that statement was cached in custom mode, it won't contain a plan. When Postgres receives a prepared statement in custom mode, it will just skip parsing, that's it. The query will still be planned, because custom plans rely on input parameters. If we run it in generic mode, then the plan is cached.

link

SigmundA 146 days ago

I think you should read carefully, this only applies to prepared statements within the same session, which is exactly what I have been saying. There is no global cache, and if you reset the session it's gone.

This controls whether prepared statements even use a cached plan at all. Other database can do this with hints and they can skip parsing by using stored procedures which are basically globally named prepared statements that the client can call without preparing a temporary one or they can do prepared but again this is typically a waste of time because parsing enough to match existing plans is fast (soft vs hard parse in Oracle speak). They have many more options with more powerful caching abilities that all clients can share across sessions.

The only time PG "automatically" caches the plan is when it implicitly prepares the plan within a PL/pgsql statement like doing a insert loop inside a function, its still is only for the current session. This is just part of the planning process in other databases that cache everything all the time globally.

You don't seem to understand that most other commercial "big-boy" RDBMS cache plans across sessions and that nothing has to be done for them to reuse between completely different connections with differing parameters and can still have specialized versions based on these parameters values vs a single generic plan.

At least now you admit prepared statements are in-fact a plan cache, contradicting your other statements, and seem to make a gotcha out of an option an option to disable that cache.

You can see various discussions on pg-hackers, here is one where the submitter confirms everything I have said and attempted to add the auto part but not tackle the much harder sharing between sessions part and was shot down, I don't believe much has change in PG around plan caching since this post and even has a guy that worked on DB2 talking about how they did it: https://www.postgresql.org/message-id/flat/8e76d8fc-8b8c-14b...

link

vladich 146 days ago

Sure, but that's not the main issue. If you add a global cache, it will have only a marginal value. There are Postgres extensions / forks with global cache and they are not wildly more efficient. The main issue you still do not understand is for different parameters you _need_ different plans, and caching doesn't help with that. It can help with parsing, sure. Parsing is very fast though, in relation to planning. And you keep conflating "prapared" statements with plan caching. Ok.

link

SigmundA 146 days ago

>If you add a global cache, it will have only a marginal value

Please substantiate this, again all other major commercial RDBMS's do this and have invested a lot of effort and money into these systems, they would not do something that has marginal value.

Again I went through the era of needing to manually prepare queries in client code when it was the only choice as it is now in PG. It was not a marginal improvement when automatic global caching became available, it was objectively measurable via industry standard benchmarks.

You can also find other post complaining about prepared statement cache memory usage especially when libs and pooler auto prepare, the cache is repeated for every connection, 100 connections equals 100X cache size. Another advantage of a shared cache, this is obvious.

I will leave you with a quote from Bruce Momjian, you know one of the founding members of the PG dev team, in the thread I linked that you didn't seem to read just like the other links I gave you:

"I think everyone agrees on the Desirability of the feature, but the Design is the tricky part."

>The main issue you still do not understand is for different parameters you _need_ different plans, and caching doesn't help with that.

You still don't seem to be grasping what other more advanced systems do here and again don't seem to be reading any of the existing literature I am giving you. These systems will make different plans if they detect its necessary, they have MULTIPLE cached plans of the same statement and you can examine their caches and see stats on their usage.

These systems also have hints that let you disable, force a single generic, tell it how you want to evaluate specific parameters for unknown values, specific hard coded values etc. if you want to override their default behavior that uses statistics and heuristic to make a determination of which plan to use.

Please I beg you read what a modern commercial DB can do here and stop saying it doesn't help or can't be done, here is a direct link: https://learn.microsoft.com/en-us/sql/relational-databases/p...

>And you keep conflating "prapared" statements with plan caching.

Again we are talking about PG and the only way PG caches a plan is using prepared statements, in PG prepared statements and plan caching are the same thing, there is no other choice.

From your own link trying to gotcha me on PG plan caching config, first sentence of plan_cache_mode: "Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans."

The only other things a prepared statement does is skip parsing, which is another part of caching, and reduce network traffic from client to server. These things can be done with stored procedures in systems that have global caches and are shared across all connections and these systems still support the very rare situation of using a prepared statement, its almost vestigial now days.

Here is Microsoft Guidance on prepared statements in MSSQL now days:

"In SQL Server, the prepare/execute model has no significant performance advantage over direct execution, because of the way SQL Server reuses execution plans. SQL Server has efficient algorithms for matching current Transact-SQL statements with execution plans that are generated for prior executions of the same Transact-SQL statement. If an application executes a Transact-SQL statement with parameter markers multiple times, SQL Server will reuse the execution plan from the first execution for the second and subsequent executions (unless the plan ages from the plan cache)."

https://learn.microsoft.com/en-us/sql/relational-databases/q...

link

magicalhippo 146 days ago

> then tries a generic one, if it worked (not much worse than an average of 5 custom plans), the plan is cached

Seems like it's not great at detecting this in all cases[1]. That said, I do note that was reproduced on PG16, perhaps they've made improvements since, given the documentation explicitly mentions what you said.

[1]: https://www.michal-drozd.com/en/blog/postgresql-prepared-sta...

link

vladich 146 days ago

That's exactly what I said above - just turn this thing off. The reason is that even if your generic plan is better than 5 custom plans before it, that doesn't guarantee much. With probability high enough to cause troubles, it's just a coincidence, and generic plans in general tend to be very bad (because they use some hardcoded constants instead of statistics for planning).

This behavior is often a source of random latency spikes, when your queries suddenly start misbehaving, and then suddenly stop doing it. If you don't have auto_explain on, it will look like mysterious glitches in production.

The few cases when they are useful are very simple ones, like single table selects by index. They are already fast, and with generic plans you can cut planning time completely. Which is kinda...not much. There are more complicated cases where they are useful, involving Postgres forks like AWS Aurora, which has query plan management subsystem, allowing to store plans directly. Then you can cut planning time for them. But that's a completely different story.

link