Hacker News new | ask | show | jobs
by ibejoeb 2244 days ago
One of the big changes in 12 allows the optimizer to work properly with CTEs, which was a major barrier to using the more expressive language features. There a good writeup here: https://paquier.xyz/postgresql-2/postgres-12-with-materializ...
2 comments

Thanks for that link.

> Historically we've always materialized the full output of a CTE query

Do you know if this means that CTEs are written to disk? I didn't think this was the case. (In the case of materialized views that qualifier means the view is written to disk).

Here "materialization" just means that the query planner treats it as an optimization fence. That means the system doesn't do any optimization between the CTE and the query referencing it. It would prepare the output of the CTE as a completely separate entity essentially as if you had dumped it to a temp table. It may not be written to disk if there was sufficient memory, but either way you're sacrificing any optimization. I believe that it would even materialize the CTE multiple times if it was referenced multiple times, but don't quote me on that.

On the other hand, if the same query were written with subqueries instead of with CTEs, then the query optimizer would not treat it as an optimization fence. If it could utilize indexes or rewrite the query to be relationally equivalent, it would do so.

Note that sometimes that optimization fence is beneficial. There are situations where it's better to create temp tables and run smaller simpler queries instead of running an extremely complex monolithic query because the query planner isn't perfect even with hints. You can still enable that optimization fence functionality in PostgreSQL if you need to, but it's generally pretty rare that it happens like this. Still, you'll see stored procedures for reports still using temp tables and even cursors sometimes because they can be made to perform better in certain situations.

A simple example would be something like predicate pushdown:

    with mycte as (
        select a, count(b) 
        from foo 
        group by a
    )
    select *
    from mycte
    where a = 10;
There is enough information in the statement to plan it as

    with mycte as (
        select a, count(b) 
        from foo 
        where a = 10;
        group by a
    )
    select *
    from mycte;
Before, Postgres would have computed the aggregate for all values of `a`.
It's in-memory materialization, which will spill to disk if it doesn't fit in memory.

Search for "Materialize node" in https://www.postgresql.org/docs/12/using-explain.html

I left postgres v9 and have been working on Vertica. It's pretty surprising that Vertica came out with that WITH/WITHOUT MATERIALIZATION keywords 2 years ago.