Hacker News new | ask | show | jobs
by morrbo 1986 days ago
I've always wondered and perhaps someone here might know....do postgres' CTEs translate into big select/sub select queries under the hood? Or are they something special entirely? Ie (forgive formatting as I'm on phone) does:

With mything as ( Select * from table where... ),

Myotherthing as ( Select * from mything where... )

Get translated to

Select * from (select * from ( select * from...)...)...)

So I'm just wondering are CTEs just easier to read, or do they offer any other known optimizations? We use them loads but mainly just for keeping larger queries easier to manage

3 comments

It's a bit hidden (or rather not pronounced enough) in the general docs but there are performance implications of using CTEs:

https://www.postgresql.org/docs/13/queries-with.html

> A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query, since that might affect all uses of the WITH query's output when it should affect only one. The multiply-referenced WITH query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

> However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output.

There are some examples below this text as well.

thanks very much for the reply, it was a case of RTFM on my end!
Not 100% about the newest versions of Postgres. But certainly in older versions CTEs created query planner boundaries.

So the planner would optimise each CTE separately, but wouldn’t optimise the entire query with all CTEs together, which of course can result in some slightly nonsensical query plans.

To my knowledge this is considered a limitation rather than a feature as it can cause performance issues with some queries.

EDIT: Some quick Googling suggests the CTEs are no longer optimisation fences, and their handling can be tweaked on a CTE by CTE basic in Postgres 12 onwards [1]

[1] https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...

In postgresql 12 a patch [0] is merged to remove the optimization fence certain conditions. From the changelog [1]

" Allow common table expressions (CTEs) to be inlined into the outer query (Andreas Karlsson, Andrew Gierth, David Fetter, Tom Lane)

Specifically, CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. Inlining can be prevented by specifying MATERIALIZED, or forced for multiply-referenced CTEs by specifying NOT MATERIALIZED. Previously, CTEs were never inlined and were always evaluated before the rest of the query. "

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... [1] https://www.postgresql.org/docs/12/release-12.html

In theory, both forms should get optimized similarly by the DB, but the practice will likely differ from database to database, and maybe even from DB version to DB version.

There are though things that CTEs can do and sub-selects can't (e.g. WITH RECURSIVE)

Didn't know about WITH RECURSIVE, very cool - thanks