| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rpedela 2244 days ago
	PG community has put a lot effort into performance the last few years, including JIT compilation in PG12. Is that criticism still true today?

5 comments

dtheodor 2244 days ago

It's very true today. The problem discussed here is performance on complex queries (e.g. subqueries), and the query planner plays a huge role in that. The Postgres query planner has various issues. Here are two recent posts talking about planner issues:

https://medium.com/@rbranson/10-things-i-hate-about-postgres...

https://www.cybertec-postgresql.com/en/things-could-be-impro...

Also JIT compilation, while very nice and a step in the right direction, is very barebones at the moment, hardly achieving its potential. Here's a long todo list of what and how to make efficient use of JIT in postgres, by the main author of the feature. https://twitter.com/AndresFreundTec/status/10025899696161996... Postgres release 12 did not add any JIT related improvements, and as far as I know no work has been done on it on 13 either.

ksec 2244 days ago

For someone not following closely, What are the reasons behind the lack of work on JIT since then?

ris 2243 days ago

The traditional view that the bottleneck for query performance is overwhelmingly disk throughput, which has become slightly less true with the advent of SSDs.

thda 2244 days ago

> as far as I know no work has been done on it on 13 either.

You should check the latest commitfest. Improvements are coming, at a steady pace.

sbuttgereit 2244 days ago

Are you sure?

I just made a very casual search of the most recently completed commitfest and I saw one entry ("JIT expression evaluation improvements") marked as "Moved to next CF" and in the upcoming one the only item planned was the one that was moved.

Maybe there's more there with less obvious names?

thda 2243 days ago

I am. 13 release is planned in the 3rd quarter. The next commitfest is in july.

unclebucknasty 2243 days ago

I would say performance is still an issue in certain cases. We had some long-running queries with views and subqueries running on a down level version of PG. We recently attempted an upgrade to 12 to address this and take advantage of the latest RDS generations. The ideal architectural solution would be to sunset the views (involving unions) in favor of a better normalized database, but you do what you can when you inherit legacy. And we're not talking about millions of rows here. It's a relatively small installment.

Anyway, to our surprise, not only did it provide little boost, but other queries that were simpler and previously caused no issues became inexplicably and intolerably slow.

We did not have time to go down the rabbit hole of query planning and explains, and being a highly dynamically queriable app, we didn't know what else was in store for other queries based on some combination of user inputs down the road.

So we rolled back. PG has otherwise been great for us. It really is the performance issue in certain situations that has caused pain.

One more note is with regard to what I would call extreme performance variability based on row counts. What I mean is there seems to be some threshold that, once crossed, causes some queries to go from perfectly fine to near-interminable. You expect some degradation as row counts go up, but here the performance behavior suddenly degrades in a nonlinear fashion. That kind of issue is difficult to tune.

ibejoeb 2244 days ago

One of the big changes in 12 allows the optimizer to work properly with CTEs, which was a major barrier to using the more expressive language features. There a good writeup here: https://paquier.xyz/postgresql-2/postgres-12-with-materializ...

doublesCs 2244 days ago

Thanks for that link.

> Historically we've always materialized the full output of a CTE query

Do you know if this means that CTEs are written to disk? I didn't think this was the case. (In the case of materialized views that qualifier means the view is written to disk).

da_chicken 2244 days ago

Here "materialization" just means that the query planner treats it as an optimization fence. That means the system doesn't do any optimization between the CTE and the query referencing it. It would prepare the output of the CTE as a completely separate entity essentially as if you had dumped it to a temp table. It may not be written to disk if there was sufficient memory, but either way you're sacrificing any optimization. I believe that it would even materialize the CTE multiple times if it was referenced multiple times, but don't quote me on that.

On the other hand, if the same query were written with subqueries instead of with CTEs, then the query optimizer would not treat it as an optimization fence. If it could utilize indexes or rewrite the query to be relationally equivalent, it would do so.

Note that sometimes that optimization fence is beneficial. There are situations where it's better to create temp tables and run smaller simpler queries instead of running an extremely complex monolithic query because the query planner isn't perfect even with hints. You can still enable that optimization fence functionality in PostgreSQL if you need to, but it's generally pretty rare that it happens like this. Still, you'll see stored procedures for reports still using temp tables and even cursors sometimes because they can be made to perform better in certain situations.

ibejoeb 2244 days ago

A simple example would be something like predicate pushdown:

    with mycte as (
        select a, count(b) 
        from foo 
        group by a
    )
    select *
    from mycte
    where a = 10;

There is enough information in the statement to plan it as

    with mycte as (
        select a, count(b) 
        from foo 
        where a = 10;
        group by a
    )
    select *
    from mycte;

Before, Postgres would have computed the aggregate for all values of `a`.

dtheodor 2244 days ago

It's in-memory materialization, which will spill to disk if it doesn't fit in memory.

Search for "Materialize node" in https://www.postgresql.org/docs/12/using-explain.html

mrits 2244 days ago

I left postgres v9 and have been working on Vertica. It's pretty surprising that Vertica came out with that WITH/WITHOUT MATERIALIZATION keywords 2 years ago.

__s 2244 days ago

& more's coming. PG13 has improved hash aggregation[0] & List (which is used throughout) has been updated from being a linked list to being an array

0: https://www.postgresql.org/message-id/507ac540ec7c20136364b5...

asah 2244 days ago

I'd really like to know this as well !!